Economic Opportunity Gaps in Allegheny County: A Tract-Level Analysis¶

EDA/Data Visualization (90-800) Final Project¶

Sofia Hutton | Sdhutton | December 2025¶


AI Disclaimer¶

Figure layout and subplot structure adapted with assistance from ChatGPT (EDA/data visualization support). Heatmap normalization and hovertext pattern developed with help from ChatGPT. Mapbox choropleth configuration (geometry → geojson pattern) adapted with assistance from ChatGPT.

Abstract¶

This project analyzes tract-level opportunity gaps across Allegheny County using ACS (2022) and FFIEC (2022) data. By examining disparities in employment, education, infrastructure access, and housing stability, it identifies where economic conditions diverge most sharply between lower- and higher-income neighborhoods. The analysis is informed by the Regional Economic Connectivity framework from SRI/ICIC, focusing on how well people and places are linked to opportunity. The results highlight specific neighborhoods and indicators that warrant targeted, place-based policy attention.

1. Introduction & Motivation¶

Why This Project?¶

Economic mobility depends heavily on place. In Pittsburgh, it’s easy to see how neighborhood-level differences in employment, education, housing stability, and infrastructure shape people’s lived opportunities. This project uses ACS (2022), FFIEC (2022), and tract-level derived indicators to map those disparities across Allegheny County.

The goal is straightforward: identify where opportunity gaps are largest, understand what drives them, and highlight patterns that matter for equitable regional development. The approach mirrors a simplified version of SRI and ICIC’s Regional Economic Connectivity framework, which focuses on how well people, places, and systems are linked within a region.

Personal Connection¶

I came to this topic from both professional experience and personal curiosity. Before moving to Pittsburgh, I worked in Washington, D.C. on regional economic development projects—work that taught me how policy, infrastructure, and labor markets interact at the metropolitan level. But that work was always somewhat distant; I analyzed regions I didn’t actually live in.

Relocating to Pittsburgh shifted that. As a new resident, I noticed stark differences between neighborhoods and realized how little I understood about the local economic landscape beneath the city’s post-industrial “comeback” narrative. This project became a way to build that understanding using data rather than assumptions.

It also reconnects to the regional connectivity framework I worked with at SRI and ICIC: the idea that strong regions aren’t just prosperous—they are well-connected. When neighborhoods have reliable access to jobs, transportation, broadband, and stable housing, opportunity becomes more evenly distributed. When those connections break down, disparities widen.

By examining tract-level differences in employment, education, infrastructure, and housing, this analysis tries to capture where Pittsburgh’s regional connectivity is strong—and where it frays.

Policy Implications¶

A tract-level view helps identify which disparities matter most and where targeted interventions could have the biggest impact. If low-income tracts show strong educational attainment but weak digital access, broadband becomes a priority. If rent burden or vacancy rates cluster spatially, neighborhood stabilization strategies may be more effective than regionwide programs.

This kind of diagnostic supports place-based policymaking—interventions that respond to the actual conditions of specific communities rather than assuming every neighborhood faces the same barriers

Research Questions¶

  1. How do employment opportunities differ between low-income and high-income census tracts?
  2. What is the relationship between educational attainment and economic outcomes?
  3. Do infrastructure gaps (broadband, transit) correlate with income classification?
  4. How does housing affordability vary across income groups?
  5. Which census tracts face the most severe multi-dimensional opportunity deficits?

2. Dataset Description¶

Data Sources¶

2.1 American Community Survey (ACS) 5-Year Estimates (2018-2022)¶

  • Source: U.S. Census Bureau via API
  • Coverage: 84415 census tracts Nationwide
  • Specific Focus: 402 census tracts in Allegheny County, PA
  • Access: https://api.census.gov/data/2022/acs/acs5
  • Variables: Labor market, education, housing, income, infrastructure, demographics

2.2 FFIEC Income Classification Data (2022)¶

  • Source: Federal Financial Institutions Examination Council
  • Purpose: Official income level classifications for Community Reinvestment Act
  • Access: https://www.ffiec.gov/censusapp.htm
  • Classifications:
    • Low Income: < 50% of MSA median
    • Moderate Income: 50-79% of MSA median
    • Middle Income: 80-119% of MSA median
    • Upper Income: ≥ 120% of MSA median

2.3 TIGER/Line Shapefiles - Pennsylvania Census Tracts (2022)¶

  • Source: U.S. Census Bureau Geography Division
  • Purpose: Census tract boundary geometries for spatial visualization and mapping
  • Access: https://www.census.gov/geographies/mapping-files/time-series/geo/tiger-line-file.html
  • Specific File: tl_2022_42_tract (Pennsylvania state FIPS code: 42)
  • Format: Shapefile (.shp, .shx, .dbf, .prj)
  • Coverage: All Pennsylvania census tracts; filtered to Allegheny County (FIPS county code: 003) for analysis
  • Use: Enables choropleth mapping of income classification and compound disadvantage scores by census tract geography

2.4 (Inspiration) ICIC and SRI's Economic Connectivity Dashboard¶

  • Source: Initiative for a Competitive Inner City (ICIC) / SRI International
  • Purpose: Influenced analytical approach to characterizing regional success across income tiers
  • Access: https://icic.shinyapps.io/economic_connectivity_dashboard/

3. Initial Hypotheses¶

H1: Low-income tracts will demonstrate significantly lower employment rates and substantially higher unemployment rates compared to middle/upper income areas, indicating systematic barriers to labor market participation.

H2: Educational attainment, particularly Bachelor's degree completion, will be markedly lower in low-income tracts, with the gap concentrated at the four-year degree level rather than distributed across all post-secondary credentials.

H3: Low-income tracts will experience significantly lower broadband access rates, creating a digital divide that limits access to remote work, online education, telehealth, and essential digital services.

H4: Low-income tracts will exhibit higher housing instability, evidenced by elevated vacancy rates and greater concentrations of rent-burdened households paying 30% or more of income toward housing costs.

H5: Disadvantage will cluster geographically and dimensionally—the same tracts facing low income will simultaneously struggle across employment, education, infrastructure, and housing, demonstrating that economic challenges compound rather than distribute randomly across the county.

4. Data Collection & Processing¶


The analysis follows these step:

  • 4.1 System and Census API Configuration
  • 4.2 Retrieve ACS Data (All States)
  • 4.3 Construct Tract Identifiers
  • 4.4 Merge FFIEC 2022 Tract Income Data
  • 4.5 Compute Derived Indicators
  • 4.6 Create Final Visualization Dataset

4.1 System and Census API Configuration¶

Defines Census variable groups for labor markets, education, housing, and infrastructure.

In [1]:
# Library configuaration
import pandas as pd
import numpy as np
import requests
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
from matplotlib.patches import Patch
from matplotlib.gridspec import GridSpec
from scipy import stats
import geopandas as gpd
import warnings
from dotenv import load_dotenv 
import os
load_dotenv()
import plotly.io as pio
import plotly.graph_objects as go

# Set the default renderer for Plotly figures
pio.renderers.default = 'notebook'

warnings.filterwarnings('ignore')

# Jupyter inline plotting
%matplotlib inline

# Plot styling
plt.style.use('seaborn-v0_8-whitegrid')
sns.set_palette("Set2")
plt.rcParams['figure.dpi'] = 110
plt.rcParams['font.size'] = 10

print("✓ Libraries imported successfully!")

# Census API setup
API_KEY = os.getenv("CENSUS_API_KEY")
BASE_URL = "https://api.census.gov/data/2022/acs/acs5"

# Variables to collect
VARIABLES = {
    "geographic": ["NAME"],
    "labor_market": ["B23025_002E", "B23025_003E", "B23025_004E", "B23025_005E"],
    "education": ["B15003_021E", "B15003_022E"],
    "housing": ["B25003_001E", "B25003_002E", "B25070_007E", "B25070_008E", 
                "B25070_009E", "B25070_010E", "B25002_001E", "B25002_003E", "B19013_001E"],
    "connectivity": ["B28002_002E"],
    "transportation": ["B08301_010E"],
    "population": ["B01003_001E"]
}

# Flatten for API call
var_list = [var for category in VARIABLES.values() for var in category]
var_string = ",".join(var_list)

print(f"✓ Configured {len(var_list)} Census variables")
✓ Libraries imported successfully!
✓ Configured 19 Census variables

4.2 Retrieve Census Data¶

Note: this step may take about one minute to run due to the size of the nationwide request.

ACS tract-level data is retrieved for all U.S. states, well exceeding the row-count requirement (83,531 observations). Allegheny County subsets are extracted downstream.

In [2]:
import requests

# Retrieve ACS tract-level data for all U.S. states
states = [
    '01', '02', '04', '05', '06', '08', '09', '10', '11', '12',
    '13', '15', '16', '17', '18', '19', '20', '21', '22', '23',
    '24', '25', '26', '27', '28', '29', '30', '31', '32', '33',
    '34', '35', '36', '37', '38', '39', '40', '41', '42', '44',
    '45', '46', '47', '48', '49', '50', '51', '53', '54', '55', '56'
]

# Collect data for all tracts nationwide
all_data = []

for state in states:
    params = {
        "get": var_string,
        "for": "tract:*",
        "in": f"state:{state}",
        "key": API_KEY,
    }

    response = requests.get(BASE_URL, params=params)

    if response.status_code == 200:
        rows = response.json()
        # Append header only once
        if not all_data:
            all_data.extend(rows)
        else:
            all_data.extend(rows[1:])  
    else:
        print(f"State {state}: request failed ({response.status_code})")

# Final dataset as list of rows
data = all_data

4.3 Build Skeleton Dataframe¶

Create standardized:

  • 6-digit Census tract code tract_6
  • 11-digit GEOID geoid = state + county + tract_6

These identifiers are required for merging with FFIEC (Federal Financial Institutions Examination Council) income classification data.

In [3]:
# Build DataFrame from API response
df = pd.DataFrame(data[1:], columns=data[0])

# Construct Census tract identifiers
df["tract_6"] = df["tract"].astype(str).str.zfill(6)
df["geoid"] = (
    df["state"].astype(str).str.zfill(2) +
    df["county"].astype(str).str.zfill(3) +
    df["tract_6"]
)

# Convert applicable columns to numeric
exclude = ["NAME", "state", "county", "tract", "tract_6", "geoid"]
numeric_cols = [c for c in df.columns if c not in exclude]

for col in numeric_cols:
    df[col] = pd.to_numeric(df[col], errors="coerce")

df.head()
Out[3]:
NAME B23025_002E B23025_003E B23025_004E B23025_005E B15003_021E B15003_022E B25003_001E B25003_002E B25070_007E ... B25002_003E B19013_001E B28002_002E B08301_010E B01003_001E state county tract tract_6 geoid
0 Census Tract 201; Autauga County; Alabama 738 732 713 19 84 182 700 519 0 ... 33 60563 643 0 1865 01 001 020100 020100 01001020100
1 Census Tract 202; Autauga County; Alabama 947 919 868 51 86 163 544 429 8 ... 136 57460 427 0 1861 01 001 020200 020200 01001020200
2 Census Tract 203; Autauga County; Alabama 1808 1781 1748 33 284 209 1305 912 9 ... 126 77371 1170 0 3492 01 001 020300 020300 01001020300
3 Census Tract 204; Autauga County; Alabama 1875 1854 1837 17 302 662 1666 1306 0 ... 56 73191 1563 53 3987 01 001 020400 020400 01001020400
4 Census Tract 205.01; Autauga County; Alabama 2504 2400 2386 14 319 763 1783 971 21 ... 74 79953 1759 72 4121 01 001 020501 020501 01001020501

5 rows × 24 columns

4.4 Merge FFIEC Income Data¶

FFIEC income variables categorizes each census tract as Low, Moderate, Middle, or Upper income based on the area's median family income relative to the MSA median. This classification enables analysis of opportunity gaps across income levels.

Specific fields (Tract MFI, % of AMI, and Income Level) are tracked in to the main dataframe using the geoid key.

In [4]:
# Load FFIEC 2022 income classification data
ffiec_path = "/Users/sofiahutton/Documents/Fall 2025 CMU Classes/visualizations with python /CensusTractList2022.xlsx"

# Read FFIEC tract sheet
ffiec = pd.read_excel(ffiec_path, sheet_name="2022 tracts")
ffiec.columns = ffiec.columns.str.strip()

# Build GEOID for merging
ffiec["geoid"] = ffiec["FIPS code"].astype(str).str.zfill(11)

# Identify income-related columns by partial match (handles naming variation)
mfi_col = [c for c in ffiec.columns if "mfi" in c.lower()][0]
pct_col = [c for c in ffiec.columns if "percentage" in c.lower()][0]
lvl_col = [c for c in ffiec.columns if "income level" in c.lower()][0]

# Standardize column names
ffiec_keep = ffiec[["geoid", mfi_col, pct_col, lvl_col]].rename(
    columns={
        mfi_col: "Tract MFI",
        pct_col: "Tract income percentage",
        lvl_col: "Tract income level",
    }
)

# Remove any existing FFIEC columns to avoid duplicate/suffixed columns on re-run
cols_to_remove = [
    "Tract MFI", "Tract income percentage", "Tract income level",
    "Tract MFI_x", "Tract MFI_y",
    "Tract income percentage_x", "Tract income percentage_y",
    "Tract income level_x", "Tract income level_y"
]
df = df.drop(columns=[c for c in df.columns if c in cols_to_remove], errors="ignore")

# Merge FFIEC indicators into the ACS dataset
df = df.merge(ffiec_keep, on="geoid", how="left")

# Helper function for formatted summary
def print_income_summary(label, subset):
    dist = subset["Tract income level"].value_counts()
    total = dist.sum()
    print(f"\n{label} Income Classification:")
    for level in ["Upper", "Middle", "Moderate", "Low", "Unknown"]:
        count = dist.get(level, 0)
        share = count / total if total > 0 else 0
        print(f"- {level:8s}: {count:5,} ({share:.1%})")

# Nation-level summary (all tracts)
print_income_summary("Nationwide", df)

# Pennsylvania-only summary (state FIPS 42)
df_pa = df[df["state"] == "42"]
print_income_summary("Pennsylvania (state = 42)", df_pa)

# Allegheny County summary (state 42, county 003)
df_allegheny = df[(df["state"] == "42") & (df["county"] == "003")]
print_income_summary("Allegheny County (state 42, county 003)", df_allegheny)
Nationwide Income Classification:
- Upper   : 22,302 (26.7%)
- Middle  : 34,720 (41.6%)
- Moderate: 18,811 (22.5%)
- Low     : 5,430 (6.5%)
- Unknown : 2,268 (2.7%)

Pennsylvania (state = 42) Income Classification:
- Upper   :   813 (23.6%)
- Middle  : 1,630 (47.3%)
- Moderate:   711 (20.6%)
- Low     :   206 (6.0%)
- Unknown :    86 (2.5%)

Allegheny County (state 42, county 003) Income Classification:
- Upper   :   115 (29.2%)
- Middle  :   137 (34.8%)
- Moderate:    83 (21.1%)
- Low     :    38 (9.6%)
- Unknown :    21 (5.3%)

4.5 Calculate Derived Metrics¶

Employment rates, vacancy, broadband access, rent burden, and other ratios are calculated.

In [5]:
# Core tract-level rates
df["employment_rate"]   = df["B23025_004E"] / df["B23025_003E"]
df["unemployment_rate"] = df["B23025_005E"] / df["B23025_003E"]
df["homeownership_rate"] = df["B25003_002E"] / df["B25003_001E"]
df["vacancy_rate"]       = df["B25002_003E"] / df["B25002_001E"]
df["broadband_rate"]     = df["B28002_002E"] / df["B01003_001E"]
df["transit_rate"]       = df["B08301_010E"] / df["B01003_001E"]

# Households paying ≥30% of income on rent
df["rent_burdened_count"] = (
    df["B25070_007E"] +
    df["B25070_008E"] +
    df["B25070_009E"] +
    df["B25070_010E"]
)

df.head()
Out[5]:
NAME B23025_002E B23025_003E B23025_004E B23025_005E B15003_021E B15003_022E B25003_001E B25003_002E B25070_007E ... Tract MFI Tract income percentage Tract income level employment_rate unemployment_rate homeownership_rate vacancy_rate broadband_rate transit_rate rent_burdened_count
0 Census Tract 201; Autauga County; Alabama 738 732 713 19 84 182 700 519 0 ... 68115.0 103.79 Middle 0.974044 0.025956 0.741429 0.045020 0.344772 0.000000 74
1 Census Tract 202; Autauga County; Alabama 947 919 868 51 86 163 544 429 8 ... 68115.0 73.60 Moderate 0.944505 0.055495 0.788603 0.200000 0.229447 0.000000 66
2 Census Tract 203; Autauga County; Alabama 1808 1781 1748 33 284 209 1305 912 9 ... 68115.0 102.93 Middle 0.981471 0.018529 0.698851 0.088050 0.335052 0.000000 190
3 Census Tract 204; Autauga County; Alabama 1875 1854 1837 17 302 662 1666 1306 0 ... 68115.0 110.95 Middle 0.990831 0.009169 0.783914 0.032520 0.392024 0.013293 8
4 Census Tract 205.01; Autauga County; Alabama 2504 2400 2386 14 319 763 1783 971 21 ... 68115.0 133.41 Upper 0.994167 0.005833 0.544588 0.039849 0.426838 0.017471 398

5 rows × 34 columns

4.6 Create Clean Output Table¶

A cleaned dataset (df_viz) is created for plotting:*

  • Rates converted to percentages
  • Rent burden components consolidated
  • Outliers and negative values addressed
  • Final feature set prepared for EDA and visualization
In [6]:
# Column mapping
output_columns = {
    "tract_6": "Tract Code (6-digit)",
    "NAME": "Tract Name",
    "Tract MFI": "FFIEC Tract MFI (2022)",
    "Tract income percentage": "FFIEC Tract income % (2022)",
    "Tract income level": "FFIEC Tract income level (2022)",
    "B01003_001E": "Total Population",
    "B23025_003E": "Labor Force",
    "B23025_004E": "Employed",
    "B23025_005E": "Unemployed",
    "employment_rate": "Employment Rate",
    "unemployment_rate": "Unemployment Rate",
    "B15003_021E": "Associates Degree",
    "B15003_022E": "Bachelors or Higher",
    "B25003_001E": "Total Housing Units",
    "B25003_002E": "Owner-Occupied",
    "homeownership_rate": "Homeownership Rate",
    "B25002_003E": "Vacant Units",
    "vacancy_rate": "Vacancy Rate",
    "rent_burdened_count": "Rent Burdened (30%+)",
    "B19013_001E": "Median Household Income",
    "B28002_002E": "With Broadband",
    "broadband_rate": "Broadband Rate",
    "B08301_010E": "Public Transit Commuters",
    "transit_rate": "Transit Rate",
}

# Create viz-ready dataframe
df_viz = df.copy()
df_viz = df_viz.rename(columns=output_columns)

# Drop the B23025_002E column (duplicate of Labor Force)
if 'B23025_002E' in df_viz.columns:
    df_viz = df_viz.drop(columns=['B23025_002E'])
    print("✓ Dropped duplicate B23025_002E column")

# Clean up the Tract Name to show only the 6-digit code
df_viz['Tract Name'] = df_viz['Tract Code (6-digit)']

# Additional derived variables
df_viz['Distress_Category'] = df_viz['FFIEC Tract income level (2022)'].apply(
    lambda x: 'Low/Moderate Income' if x in ['Low', 'Moderate'] else 'Middle/Upper Income'
)
df_viz['Bachelors_Plus_Rate'] = (df_viz['Bachelors or Higher'] / df_viz['Total Population'] * 100)
df_viz['Rent_Burden_Rate'] = (df_viz['Rent Burdened (30%+)'] / df_viz['Total Housing Units'] * 100)

# Convert rates to percentages - From here on, all rate columns in df_viz are expressed as percentages (0–100), not proportions (0–1).
for col in ['Employment Rate', 'Unemployment Rate', 'Homeownership Rate', 
            'Vacancy Rate', 'Broadband Rate', 'Transit Rate']:
    df_viz[col] = df_viz[col] * 100

# Drop the individual rent burden components since we have the total
rent_burden_components = ['B25070_007E', 'B25070_008E', 'B25070_009E', 'B25070_010E', 'B25002_001E']
df_viz = df_viz.drop(columns=rent_burden_components)
print(f"✓ Dropped {len(rent_burden_components)} rent burden component columns")

print("✓ Visualization dataset ready")
print(f"Shape: {df_viz.shape}")
df_viz.head()
✓ Dropped duplicate B23025_002E column
✓ Dropped 5 rent burden component columns
✓ Visualization dataset ready
Shape: (84415, 31)
Out[6]:
Tract Name Labor Force Employed Unemployed Associates Degree Bachelors or Higher Total Housing Units Owner-Occupied Vacant Units Median Household Income ... Employment Rate Unemployment Rate Homeownership Rate Vacancy Rate Broadband Rate Transit Rate Rent Burdened (30%+) Distress_Category Bachelors_Plus_Rate Rent_Burden_Rate
0 020100 732 713 19 84 182 700 519 33 60563 ... 97.404372 2.595628 74.142857 4.502046 34.477212 0.000000 74 Middle/Upper Income 9.758713 10.571429
1 020200 919 868 51 86 163 544 429 136 57460 ... 94.450490 5.549510 78.860294 20.000000 22.944653 0.000000 66 Low/Moderate Income 8.758732 12.132353
2 020300 1781 1748 33 284 209 1305 912 126 77371 ... 98.147108 1.852892 69.885057 8.805031 33.505155 0.000000 190 Middle/Upper Income 5.985109 14.559387
3 020400 1854 1837 17 302 662 1666 1306 56 73191 ... 99.083064 0.916936 78.391357 3.252033 39.202408 1.329320 8 Middle/Upper Income 16.603963 0.480192
4 020501 2400 2386 14 319 763 1783 971 74 79953 ... 99.416667 0.583333 54.458777 3.984922 42.683815 1.747149 398 Middle/Upper Income 18.514924 22.321929

5 rows × 31 columns


5. Exploratory Data Analysis¶

5.1 Summary Statistics -¶

Purpose: establish a contextual baseline to clarify whether local disparities reflect uniquely Pittsburgh-specific challenges or broader national patterns.

This section provides a high-level comparison of tract-level socioeconomic conditions across three geographies:

  • Allegheny County (local)
  • Pennsylvania (state)
  • Nationwide (benchmark)

For each region, the table reports:

  • Aggregate population
  • Average labor-market conditions (employment, unemployment)
  • Educational attainment (BA+ rate)
  • Housing stability indicators (homeownership, vacancy)
  • Broadband access
  • Median household income
In [7]:
# Create geographic filters and CLEAN the data
allegheny = df_viz[(df_viz['state'] == '42') & (df_viz['county'] == '003')].copy()
pennsylvania = df_viz[df_viz['state'] == '42'].copy()
nationwide = df_viz.copy()

# Clean Median Household Income (remove negative values and extreme outliers)
for df_geo in [allegheny, pennsylvania, nationwide]:
    df_geo.loc[df_geo['Median Household Income'] <= 0, 'Median Household Income'] = np.nan
    df_geo.loc[df_geo['Median Household Income'] > 500000, 'Median Household Income'] = np.nan

key_metrics = ['Total Population', 'Employment Rate', 'Unemployment Rate',
               'Bachelors_Plus_Rate', 'Median Household Income', 'Broadband Rate',
               'Homeownership Rate', 'Vacancy Rate']

print("="*80)
print("SUMMARY STATISTICS: ALLEGHENY vs PENNSYLVANIA vs NATIONWIDE")
print("="*80)

# Create comparison table
summary_comparison = pd.DataFrame({
    'Allegheny County': [
        allegheny['Total Population'].sum(),
        allegheny['Employment Rate'].mean(),
        allegheny['Unemployment Rate'].mean(),
        allegheny['Bachelors_Plus_Rate'].mean(),
        allegheny['Median Household Income'].mean(),
        allegheny['Broadband Rate'].mean(),
        allegheny['Homeownership Rate'].mean(),
        allegheny['Vacancy Rate'].mean()
    ],
    'Pennsylvania': [
        pennsylvania['Total Population'].sum(),
        pennsylvania['Employment Rate'].mean(),
        pennsylvania['Unemployment Rate'].mean(),
        pennsylvania['Bachelors_Plus_Rate'].mean(),
        pennsylvania['Median Household Income'].mean(),
        pennsylvania['Broadband Rate'].mean(),
        pennsylvania['Homeownership Rate'].mean(),
        pennsylvania['Vacancy Rate'].mean()
    ],
    'Nationwide': [
        nationwide['Total Population'].sum(),
        nationwide['Employment Rate'].mean(),
        nationwide['Unemployment Rate'].mean(),
        nationwide['Bachelors_Plus_Rate'].mean(),
        nationwide['Median Household Income'].mean(),
        nationwide['Broadband Rate'].mean(),
        nationwide['Homeownership Rate'].mean(),
        nationwide['Vacancy Rate'].mean()
    ]
}, index=key_metrics)

print(f"\n{'Geography':<25} {'Allegheny':<20} {'Pennsylvania':<20} {'Nationwide':<20}")
print("-" * 90)
print(f"{'Census Tracts':<25} {len(allegheny):<20,} {len(pennsylvania):<20,} {len(nationwide):<20,}")
print("\nValues (Total Population = SUM, others = MEAN):")
print("-" * 90)

# Format each row nicely
for metric in key_metrics:
    allegheny_val = summary_comparison.loc[metric, 'Allegheny County']
    pa_val = summary_comparison.loc[metric, 'Pennsylvania']
    nation_val = summary_comparison.loc[metric, 'Nationwide']
    
    if metric == 'Total Population':
        print(f"{metric:<25} {allegheny_val:>20,.0f} {pa_val:>20,.0f} {nation_val:>20,.0f}")
    elif metric == 'Median Household Income':
        print(f"{metric:<25} ${allegheny_val:>19,.2f} ${pa_val:>19,.2f} ${nation_val:>19,.2f}")
    else:
        print(f"{metric:<25} {allegheny_val:>20.2f} {pa_val:>20.2f} {nation_val:>20.2f}")

# Data quality note
print("\n" + "="*90)
print("DATA QUALITY NOTE:")
print(f"  Allegheny: {allegheny['Median Household Income'].isna().sum()} tracts with missing/invalid income data")
print(f"  Pennsylvania: {pennsylvania['Median Household Income'].isna().sum()} tracts with missing/invalid income data")
print(f"  Nationwide: {nationwide['Median Household Income'].isna().sum()} tracts with missing/invalid income data")
================================================================================
SUMMARY STATISTICS: ALLEGHENY vs PENNSYLVANIA vs NATIONWIDE
================================================================================

Geography                 Allegheny            Pennsylvania         Nationwide          
------------------------------------------------------------------------------------------
Census Tracts             394                  3,446                84,415              

Values (Total Population = SUM, others = MEAN):
------------------------------------------------------------------------------------------
Total Population                     1,245,310           12,989,208          331,097,593
Employment Rate                          94.42                94.31                94.36
Unemployment Rate                         5.58                 5.69                 5.64
Bachelors_Plus_Rate                      17.60                13.92                14.07
Median Household Income   $          75,812.23 $          77,527.23 $          80,716.70
Broadband Rate                           39.81                35.45                34.06
Homeownership Rate                       62.99                68.40                64.77
Vacancy Rate                             10.20                 9.94                10.73

==========================================================================================
DATA QUALITY NOTE:
  Allegheny: 13 tracts with missing/invalid income data
  Pennsylvania: 63 tracts with missing/invalid income data
  Nationwide: 1517 tracts with missing/invalid income data

5.1 Key Patterns Observed¶

• Allegheny County’s employment and unemployment rates closely track statewide and national averages, suggesting broadly similar labor-market conditions.

• Educational attainment and broadband access are slightly stronger in Allegheny than in Pennsylvania or the U.S. overall.

• Homeownership rates are noticeably lower in Allegheny, reflecting the county’s older rental housing stock and more urban development patterns.

• Vacancy rates are similar to the national average, though slightly higher than those for Pennsylvania as a whole.

• Missing income values appear across all geographies, but Allegheny has relatively few tracts with incomplete data (13 total).

5.2 Income Distribution Profiles: Allegheny vs. Pennsylvania vs. Nationwide¶

Using FFIEC income categories (Low, Moderate, Middle, Upper), this section compares how demographic and economic conditions vary within each region by income group.

For each income tier, the notebook reports:

  • Total population
  • Employment and unemployment rates
  • Educational attainment (BA+ share)
  • Median household income
  • Broadband access
  • Homeownership and vacancy rates
  • Sample sizes (tract counts)
In [8]:
print("\n" + "="*90)
print("INCOME LEVEL BREAKDOWN: ALLEGHENY vs PENNSYLVANIA vs NATIONWIDE")
print("="*90)

# Create comparison for each geography
for geo_name, geo_data in [('Allegheny County', allegheny), 
                            ('Pennsylvania', pennsylvania), 
                            ('Nationwide', nationwide)]:
    print(f"\n{geo_name.upper()}")
    print("="*90)
    
    # Group by income level
    grouped = geo_data.groupby('FFIEC Tract income level (2022)')
    
    # Calculate: SUM for Total Population, MEAN for everything else
    income_breakdown = grouped.agg({
        'Total Population': 'sum',
        'Employment Rate': 'mean',
        'Unemployment Rate': 'mean',
        'Bachelors_Plus_Rate': 'mean',
        'Median Household Income': 'mean',
        'Broadband Rate': 'mean',
        'Homeownership Rate': 'mean',
        'Vacancy Rate': 'mean'
    })
    
    # Format and display
    print(f"\n{'Metric':<30} {'Low':<15} {'Moderate':<15} {'Middle':<15} {'Upper':<15}")
    print("-"*90)
    
    for metric in key_metrics:
        if metric == 'Total Population':
            print(f"{metric:<30}", end="")
            for income_level in ['Low', 'Moderate', 'Middle', 'Upper']:
                if income_level in income_breakdown.index:
                    val = income_breakdown.loc[income_level, metric]
                    print(f"{val:>14,.0f} ", end="")
                else:
                    print(f"{'N/A':>14} ", end="")
            print()
        elif metric == 'Median Household Income':
            print(f"{metric:<30}", end="")
            for income_level in ['Low', 'Moderate', 'Middle', 'Upper']:
                if income_level in income_breakdown.index:
                    val = income_breakdown.loc[income_level, metric]
                    print(f"${val:>13,.0f} ", end="")
                else:
                    print(f"{'N/A':>14} ", end="")
            print()
        else:
            print(f"{metric:<30}", end="")
            for income_level in ['Low', 'Moderate', 'Middle', 'Upper']:
                if income_level in income_breakdown.index:
                    val = income_breakdown.loc[income_level, metric]
                    print(f"{val:>14.2f} ", end="")
                else:
                    print(f"{'N/A':>14} ", end="")
            print()
    
    # Sample sizes
    print("\n" + "-"*90)
    print("Sample Sizes (number of census tracts):")
    tract_counts = geo_data['FFIEC Tract income level (2022)'].value_counts()
    print(f"{'Low:':<15} {tract_counts.get('Low', 0):>6,}  |  ", end="")
    print(f"{'Moderate:':<15} {tract_counts.get('Moderate', 0):>6,}  |  ", end="")
    print(f"{'Middle:':<15} {tract_counts.get('Middle', 0):>6,}  |  ", end="")
    print(f"{'Upper:':<15} {tract_counts.get('Upper', 0):>6,}")
==========================================================================================
INCOME LEVEL BREAKDOWN: ALLEGHENY vs PENNSYLVANIA vs NATIONWIDE
==========================================================================================

ALLEGHENY COUNTY
==========================================================================================

Metric                         Low             Moderate        Middle          Upper          
------------------------------------------------------------------------------------------
Total Population                      91,223        219,912        437,560        479,547 
Employment Rate                        89.25          92.55          95.30          96.72 
Unemployment Rate                      10.75           7.45           4.70           3.28 
Bachelors_Plus_Rate                     7.51          12.87          18.87          23.26 
Median Household Income       $       33,979 $       49,919 $       72,018 $      114,459 
Broadband Rate                         33.57          40.04          41.72          39.32 
Homeownership Rate                     34.57          55.04          67.35          75.52 
Vacancy Rate                           19.74          13.67           8.01           6.46 

------------------------------------------------------------------------------------------
Sample Sizes (number of census tracts):
Low:                38  |  Moderate:           83  |  Middle:            137  |  Upper:             115

PENNSYLVANIA
==========================================================================================

Metric                         Low             Moderate        Middle          Upper          
------------------------------------------------------------------------------------------
Total Population                     683,935      2,456,728      6,341,164      3,375,388 
Employment Rate                        88.66          92.35          95.12          95.95 
Unemployment Rate                      11.34           7.65           4.88           4.05 
Bachelors_Plus_Rate                     6.34           9.92          13.44          20.70 
Median Household Income       $       36,741 $       54,639 $       75,394 $      113,158 
Broadband Rate                         31.73          34.92          35.59          37.41 
Homeownership Rate                     36.68          56.63          74.12          76.89 
Vacancy Rate                           13.83          12.39           9.94           6.38 

------------------------------------------------------------------------------------------
Sample Sizes (number of census tracts):
Low:               206  |  Moderate:          711  |  Middle:          1,630  |  Upper:             813

NATIONWIDE
==========================================================================================

Metric                         Low             Moderate        Middle          Upper          
------------------------------------------------------------------------------------------
Total Population                  18,412,899     71,654,894    138,968,321     95,017,217 
Employment Rate                        89.52          92.95          95.01          95.91 
Unemployment Rate                      10.48           7.05           4.99           4.09 
Bachelors_Plus_Rate                     6.70           9.57          13.46          20.76 
Median Household Income       $       39,011 $       56,105 $       76,431 $      118,666 
Broadband Rate                         30.17          32.38          34.42          36.22 
Homeownership Rate                     31.72          52.88          70.10          76.12 
Vacancy Rate                           13.30          11.47          11.00           8.76 

------------------------------------------------------------------------------------------
Sample Sizes (number of census tracts):
Low:             5,430  |  Moderate:       18,811  |  Middle:         34,720  |  Upper:          22,302

5.2 Key Patterns: Income Distribution Profiles¶

• Across Allegheny County, Pennsylvania, and the U.S., socioeconomic conditions vary sharply by income tier, with low-income tracts consistently showing weaker outcomes across all indicators.

• Low-income tracts exhibit substantially lower educational attainment, which tracks closely with lower employment and higher unemployment in every geography.

• Upper-income tracts in Allegheny County have exceptionally high BA+ rates—even higher than statewide and national averages—highlighting the region’s strong concentration of educated neighborhoods.

• Homeownership follows the steepest income gradient, with gaps of 20–40 percentage points between low-income and upper-income tracts across all regions.

• Vacancy rates are significantly higher in low-income tracts, especially in Allegheny County, indicating localized housing distress.

• Broadband access improves steadily with income, though the gap is smaller compared to education or housing indicators.

5.3 Opportunity Gaps: Lower-Income vs. Higher-Income Tracts¶

To highlight structural inequities more clearly, this section collapses the FFIEC categories into:

  • Low/Moderate Income Tracts
  • Middle/Upper Income Tracts

For each geography (Allegheny, Pennsylvania, U.S.), it reports:

  • Employment rate & gap
  • Unemployment rate & gap
  • Broadband access & gap
  • Homeownership & gap
  • Vacancy & gap
  • Transit use differences
  • Corresponding population and tract counts
In [9]:
print("\n" + "="*90)
print("OPPORTUNITY GAPS: LOW/MODERATE vs MIDDLE/UPPER INCOME")
print("Comparison across Allegheny County, Pennsylvania, and Nationwide")
print("="*90)

gap_metrics = ['Employment Rate', 'Unemployment Rate', 'Broadband Rate', 
               'Homeownership Rate', 'Vacancy Rate', 'Transit Rate']

# Calculate gaps for each geography
results = []

for geo_name, geo_data in [('Allegheny', allegheny), 
                            ('Pennsylvania', pennsylvania), 
                            ('Nationwide', nationwide)]:
    
    low_mod = geo_data[geo_data['Distress_Category'] == 'Low/Moderate Income'][gap_metrics].mean()
    mid_upper = geo_data[geo_data['Distress_Category'] == 'Middle/Upper Income'][gap_metrics].mean()
    gap = mid_upper - low_mod
    
    # Get population and tract counts
    low_mod_pop = geo_data[geo_data['Distress_Category'] == 'Low/Moderate Income']['Total Population'].sum()
    mid_upper_pop = geo_data[geo_data['Distress_Category'] == 'Middle/Upper Income']['Total Population'].sum()
    low_mod_tracts = len(geo_data[geo_data['Distress_Category'] == 'Low/Moderate Income'])
    mid_upper_tracts = len(geo_data[geo_data['Distress_Category'] == 'Middle/Upper Income'])
    
    results.append({
        'Geography': geo_name,
        'Low/Mod_n': low_mod_tracts,
        'Mid/Upper_n': mid_upper_tracts,
        'Low/Mod_pop': low_mod_pop,
        'Mid/Upper_pop': mid_upper_pop,
        **{f'{metric}_LowMod': low_mod[metric] for metric in gap_metrics},
        **{f'{metric}_MidUpper': mid_upper[metric] for metric in gap_metrics},
        **{f'{metric}_Gap': gap[metric] for metric in gap_metrics}
    })

# Display in organized format
for metric in gap_metrics:
    print(f"\n{metric.upper()}")
    print("-"*90)
    print(f"{'Geography':<15} {'Low/Moderate':<15} {'Middle/Upper':<15} {'Gap':<15} {'% Difference':<15}")
    print("-"*90)
    
    for result in results:
        low_mod_val = result[f'{metric}_LowMod']
        mid_upper_val = result[f'{metric}_MidUpper']
        gap_val = result[f'{metric}_Gap']
        pct_diff = (gap_val / low_mod_val * 100) if low_mod_val != 0 else 0
        
        print(f"{result['Geography']:<15} {low_mod_val:>14.2f} {mid_upper_val:>14.2f} {gap_val:>14.2f} {pct_diff:>14.1f}%")

# Sample sizes with population totals
print("\n" + "="*90)
print("SAMPLE SIZES & POPULATIONS")
print("="*90)
print(f"{'Geography':<15} {'Category':<20} {'Tracts':<15} {'Total Population':<20}")
print("-"*90)

for result in results:
    print(f"{result['Geography']:<15} {'Low/Moderate Income':<20} {result['Low/Mod_n']:>14,} {result['Low/Mod_pop']:>19,.0f}")
    print(f"{'':15} {'Middle/Upper Income':<20} {result['Mid/Upper_n']:>14,} {result['Mid/Upper_pop']:>19,.0f}")
    print()
==========================================================================================
OPPORTUNITY GAPS: LOW/MODERATE vs MIDDLE/UPPER INCOME
Comparison across Allegheny County, Pennsylvania, and Nationwide
==========================================================================================
EMPLOYMENT RATE
------------------------------------------------------------------------------------------
Geography       Low/Moderate    Middle/Upper    Gap             % Difference   
------------------------------------------------------------------------------------------
Allegheny                91.51          95.76           4.25            4.6%
Pennsylvania             91.52          95.34           3.81            4.2%
Nationwide               92.18          95.25           3.07            3.3%

UNEMPLOYMENT RATE
------------------------------------------------------------------------------------------
Geography       Low/Moderate    Middle/Upper    Gap             % Difference   
------------------------------------------------------------------------------------------
Allegheny                 8.49           4.24          -4.25          -50.0%
Pennsylvania              8.48           4.66          -3.81          -45.0%
Nationwide                7.82           4.75          -3.07          -39.3%

BROADBAND RATE
------------------------------------------------------------------------------------------
Geography       Low/Moderate    Middle/Upper    Gap             % Difference   
------------------------------------------------------------------------------------------
Allegheny                38.00          40.64           2.64            6.9%
Pennsylvania             34.20          35.91           1.71            5.0%
Nationwide               31.89          34.95           3.06            9.6%

HOMEOWNERSHIP RATE
------------------------------------------------------------------------------------------
Geography       Low/Moderate    Middle/Upper    Gap             % Difference   
------------------------------------------------------------------------------------------
Allegheny                48.61          69.63          21.01           43.2%
Pennsylvania             52.15          74.40          22.25           42.7%
Nationwide               48.14          71.59          23.45           48.7%

VACANCY RATE
------------------------------------------------------------------------------------------
Geography       Low/Moderate    Middle/Upper    Gap             % Difference   
------------------------------------------------------------------------------------------
Allegheny                15.58           7.72          -7.86          -50.5%
Pennsylvania             12.71           8.91          -3.80          -29.9%
Nationwide               11.88          10.26          -1.62          -13.6%

TRANSIT RATE
------------------------------------------------------------------------------------------
Geography       Low/Moderate    Middle/Upper    Gap             % Difference   
------------------------------------------------------------------------------------------
Allegheny                 6.02           3.13          -2.89          -48.1%
Pennsylvania              3.49           1.67          -1.82          -52.1%
Nationwide                2.64           1.54          -1.10          -41.7%

==========================================================================================
SAMPLE SIZES & POPULATIONS
==========================================================================================
Geography       Category             Tracts          Total Population    
------------------------------------------------------------------------------------------
Allegheny       Low/Moderate Income             121             311,135
                Middle/Upper Income             273             934,175

Pennsylvania    Low/Moderate Income             917           3,140,663
                Middle/Upper Income           2,529           9,848,545

Nationwide      Low/Moderate Income          24,241          90,067,793
                Middle/Upper Income          60,174         241,029,800

5.3 Key Patterns: Opportunity Gaps Between Lower-Income and Higher-Income Tracts¶

• Employment and unemployment gaps are large and persistent, with low/moderate-income tracts facing unemployment rates roughly 40–50% higher than middle/upper-income tracts at all geographic levels.

• Homeownership shows the widest disparity, with higher-income tracts maintaining ownership rates 20–23 percentage points above lower-income areas—evidence of deep structural divides in wealth and housing stability.

• Vacancy gaps are particularly striking in Allegheny County, where lower-income tracts have vacancy rates more than 50% higher, signaling concentrated neighborhood distress.

• Transit dependence is substantially higher in lower-income tracts, especially in Allegheny, suggesting differences in car access rather than transit availability.

• Broadband divides persist, but the magnitude of these gaps is smaller than those for housing or labor-market outcomes.

• Overall, Allegheny’s opportunity gaps mirror statewide and national patterns, though disparities in vacancy and transportation appear somewhat sharper locally.


6. Visualizations¶

The following visualizations explore economic opportunity indicators across Allegheny County census tracts.

6.1 Opportunity Gaps Across Income Levels¶

• Description: Horizontal grouped bar chart comparing percentage point differences between Middle/Upper and Low/Moderate income tracts across six key metrics (Employment Rate, Unemployment Rate, Broadband Access, Homeownership Rate, Vacancy Rate, Bachelor's Degree+).

• Objective: Establish the magnitude of opportunity gaps and demonstrate that disparities exist across multiple dimensions, not just one or two isolated metrics.Methodology: Calculate mean values for each metric within Low/Moderate and Middle/Upper income groups for three geographies (Allegheny County, Pennsylvania, Nationwide).

• Methodology: Display percentage point differences with color-coded bars by geography.

In [10]:
# 6.1 Opportunity gaps across income levels

import plotly.graph_objects as go
import plotly.express as px  # kept in case used elsewhere

# Metrics to compare between low/moderate- and middle/upper-income tracts
gap_metrics = [
    "Employment Rate",
    "Unemployment Rate",
    "Broadband Rate",
    "Homeownership Rate",
    "Vacancy Rate",
    "Bachelors_Plus_Rate",
]

gap_data = []

# Compute percentage differences by geography
for geo_name, geo_data in [
    ("Allegheny County", allegheny),
    ("Pennsylvania", pennsylvania),
    ("Nationwide", nationwide),
]:
    low_mod = geo_data[geo_data["Distress_Category"] == "Low/Moderate Income"][gap_metrics].mean()
    mid_upper = geo_data[geo_data["Distress_Category"] == "Middle/Upper Income"][gap_metrics].mean()

    for metric in gap_metrics:
        gap = mid_upper[metric] - low_mod[metric]
        pct_diff = (gap / low_mod[metric] * 100) if low_mod[metric] != 0 else 0

        gap_data.append(
            {
                "Geography": geo_name,
                "Metric": metric,
                "Gap": gap,
                "Percent_Diff": pct_diff,
            }
        )

gap_df = pd.DataFrame(gap_data)

# Colors by geography
colors = {
    "Allegheny County": "#1f77b4",  # blue
    "Pennsylvania": "#9467bd",      # purple
    "Nationwide": "#ff7f0e",        # orange
}

fig = go.Figure()

for geo in ["Allegheny County", "Pennsylvania", "Nationwide"]:
    geo_data = gap_df[gap_df["Geography"] == geo]

    fig.add_trace(
        go.Bar(
            y=geo_data["Metric"],
            x=geo_data["Percent_Diff"],
            name=geo,
            orientation="h",
            marker=dict(color=colors[geo], opacity=0.8),
            text=geo_data["Percent_Diff"].round(1).astype(str) + "%",
            textposition="outside",
            textfont=dict(size=11, family="Arial, sans-serif"),
            hovertemplate=(
                "<b>%{y}</b><br>"
                f"<b>{geo}</b><br>"
                "Gap: %{x:.1f}%<br>"
                "<extra></extra>"
            ),
        )
    )

fig.update_layout(
    title=dict(
        text="Opportunity Gaps Across Income Levels",
        x=0.5,
        xanchor="center",
        font=dict(size=24, family="Arial, sans-serif", color="#2c3e50"),
    ),
    annotations=[
        dict(
            text="Percentage difference: Middle/Upper Income minus Low/Moderate Income tracts",
            x=0.5,
            y=-0.15,
            xref="paper",
            yref="paper",
            xanchor="center",
            yanchor="top",
            showarrow=False,
            font=dict(size=12, color="#7f8c8d", family="Arial, sans-serif"),
        )
    ],
    xaxis=dict(
        title="Percentage Point Difference (%)",
        title_font=dict(size=14, family="Arial, sans-serif"),
        gridcolor="#ecf0f1",
        tickfont=dict(size=12, family="Arial, sans-serif"),
    ),
    yaxis=dict(
        title="",
        tickfont=dict(size=13, family="Arial, sans-serif"),
    ),
    barmode="group",
    height=550,
    width=1100,
    template="plotly_white",
    plot_bgcolor="white",
    paper_bgcolor="white",
    legend=dict(
        orientation="h",
        yanchor="top",
        y=1.12,
        xanchor="center",
        x=0.5,
        font=dict(size=13, family="Arial, sans-serif"),
        bgcolor="rgba(255,255,255,0.8)",
        bordercolor="#bdc3c7",
        borderwidth=1,
    ),
    font=dict(family="Arial, sans-serif"),
    margin=dict(t=120, b=100, l=150, r=80),
)

fig.show()

Interpretation: Across every dimension measured, middle/upper income tracts significantly outperform low/moderate income areas. Two gaps stand out as particularly severe: unemployment and education.

The unemployment disparity is stark—low/moderate income tracts experience unemployment rates 39-50 percentage points higher than their middle/upper income counterparts. This means disadvantaged areas face roughly double-digit unemployment while affluent areas hover around 3-5%, representing a fundamental difference in labor market access and economic stability.

Education shows an equally dramatic divide. Middle/upper income tracts have 72-84 percentage points more residents with Bachelor's degrees, revealing that higher education remains concentrated in already-privileged communities. Homeownership follows a similar pattern with 42-48 point gaps, reflecting both affordability barriers and limited wealth-building opportunities in lower-income neighborhoods.

Even infrastructure shows meaningful disparities. Broadband access lags by 5-10 percentage points in disadvantaged areas, limiting access to remote work, online education, and digital services. Employment gaps (3-4 points) appear smaller but still indicate persistent barriers to full labor force participation.

What's striking is the consistency: Allegheny County mirrors both Pennsylvania and nationwide patterns almost exactly. This suggests we're seeing systemic inequities, not local anomalies—and that addressing them will require more than isolated, place-specific interventions.

6.2 Economic Opportunity Scorecard by Geography and Income Level¶

• Description: Heatmap displaying mean values for seven economic indicators across four income levels (Low, Moderate, Middle, Upper) and three geographies (Allegheny, Pennsylvania, Nationwide).

• Objective: Provide a comprehensive, at-a-glance comparison showing how outcomes vary simultaneously across income levels and geographic scales.

• Methodology: Calculate mean values for each metric by income level and geography. Normalize color scale so blue = better outcomes and red = worse outcomes across all metrics (inverting Unemployment and Vacancy rates). Display values in cells with color intensity representing relative performance.

In [11]:
# 6.2 Economic opportunity scorecard heatmap

import plotly.graph_objects as go
import numpy as np

# Metrics to include in the heatmap
metrics_for_heatmap = [
    "Employment Rate",
    "Unemployment Rate",
    "Bachelors_Plus_Rate",
    "Median Household Income",
    "Broadband Rate",
    "Homeownership Rate",
    "Vacancy Rate",
]

heatmap_data = []

# Aggregate mean values by geography and FFIEC income level
for geo_name, geo_data in [
    ("Allegheny County", allegheny),
    ("Pennsylvania", pennsylvania),
    ("Nationwide", nationwide),
]:
    for income_level in ["Low", "Moderate", "Middle", "Upper"]:
        subset = geo_data[geo_data["FFIEC Tract income level (2022)"] == income_level]

        if len(subset) == 0:
            continue

        row_data = {
            "Geography": geo_name,
            "Income Level": income_level,
        }

        for metric in metrics_for_heatmap:
            if metric == "Median Household Income":
                # Convert to thousands for display
                row_data[metric] = subset[metric].mean() / 1000
            else:
                row_data[metric] = subset[metric].mean()

        heatmap_data.append(row_data)

heatmap_df = pd.DataFrame(heatmap_data)

# Row labels (geography + income tier)
heatmap_df["Label"] = heatmap_df["Geography"] + " - " + heatmap_df["Income Level"]

# Normalize all metrics to a 0–100 scale where higher = better outcome
z_data_normalized = []
display_values = []

for metric in metrics_for_heatmap:
    col_data = heatmap_df[metric].values
    min_val, max_val = col_data.min(), col_data.max()

    if max_val == min_val:
        # Avoid divide-by-zero; flat metric across all groups
        normalized = np.full_like(col_data, 50, dtype=float)
    else:
        if metric in ["Unemployment Rate", "Vacancy Rate"]:
            # Lower is better: invert so high normalized = better outcome
            normalized = 100 - ((col_data - min_val) / (max_val - min_val) * 100)
        else:
            # Higher is better
            normalized = (col_data - min_val) / (max_val - min_val) * 100

    z_data_normalized.append(normalized)
    display_values.append(col_data)

z_data_normalized = np.array(z_data_normalized).T  # shape: (rows, metrics)
display_values = np.array(display_values).T

# Axis labels
y_labels = heatmap_df["Label"].values
x_labels = [
    "Employment<br>Rate (%)",
    "Unemployment<br>Rate (%)",
    "Bachelor's<br>Degree+ (%)",
    "Median HH<br>Income ($K)",
    "Broadband<br>Access (%)",
    "Homeownership<br>Rate (%)",
    "Vacancy<br>Rate (%)",
]

# Custom hover text with original values
hover_text = []
for i, row in heatmap_df.iterrows():
    row_hover = []
    for j, metric in enumerate(metrics_for_heatmap):
        val = display_values[i][j]
        if metric == "Median Household Income":
            formatted_val = f"${val:.1f}K"
        else:
            formatted_val = f"{val:.1f}%"
        row_hover.append(
            f"<b>{metric}</b><br>{formatted_val}<br><b>{row['Label']}</b>"
        )
    hover_text.append(row_hover)

# Heatmap: colors reflect normalized scores, text shows original metric values
fig = go.Figure(
    data=go.Heatmap(
        z=z_data_normalized,
        x=x_labels,
        y=y_labels,
        colorscale="RdYlBu_r",  # red = weaker outcomes, blue = stronger
        text=np.round(display_values, 1),
        texttemplate="%{text}",
        textfont={"size": 10},
        hovertext=hover_text,
        hoverinfo="text",
        colorbar=dict(
            title="Outcome<br>Quality",
            titleside="right",
            tickmode="array",
            tickvals=[0, 50, 100],
            ticktext=["Worse", "Average", "Better"],
            tickfont=dict(size=11, family="Arial, sans-serif"),
            titlefont=dict(size=12, family="Arial, sans-serif"),
        ),
    )
)

# Horizontal separators between income groups (assuming 4 rows per geography)
fig.add_shape(
    type="line",
    x0=-0.5,
    x1=len(x_labels) - 0.5,
    y0=3.5,
    y1=3.5,
    line=dict(color="white", width=3),
)
fig.add_shape(
    type="line",
    x0=-0.5,
    x1=len(x_labels) - 0.5,
    y0=7.5,
    y1=7.5,
    line=dict(color="white", width=3),
)

fig.update_layout(
    title=dict(
        text="Economic Opportunity Scorecard by Geography and Income Level",
        x=0.5,
        xanchor="center",
        font=dict(size=24, family="Arial, sans-serif", color="#2c3e50"),
    ),
    annotations=[
        dict(
            text=(
                "Mean values across census tracts | "
                "Blue = better outcomes, Red = worse outcomes (normalized across all metrics)"
            ),
            x=0.5,
            y=-0.12,
            xref="paper",
            yref="paper",
            xanchor="center",
            yanchor="top",
            showarrow=False,
            font=dict(size=11, color="#7f8c8d", family="Arial, sans-serif"),
        )
    ],
    xaxis=dict(
        title="",
        side="bottom",
        tickfont=dict(size=11, family="Arial, sans-serif"),
    ),
    yaxis=dict(
        title="",
        tickfont=dict(size=11, family="Arial, sans-serif"),
    ),
    height=700,
    width=1100,
    template="plotly_white",
    margin=dict(t=100, b=100, l=200, r=150),
)

fig.show()

Interpretation:

The heatmap reveals a stark pattern: low-income tracts (bottom rows) show predominantly blue coloring across all metrics, indicating uniformly poor outcomes, while upper-income tracts (top rows) display red/orange, signaling consistently strong performance. This isn't about one or two problem areas—disadvantage is comprehensive.

Looking across the rows, low-income tracts in Allegheny score poorly on employment (89.2%), education (7.5% Bachelor's+), median income ($34K), broadband (33.6%), and homeownership (34.6%). Upper-income tracts excel across every dimension: 96.7% employment, 23.3% Bachelor's+, $114.5K median income, and 75.5% homeownership. The color gradient from blue to red as you move up income levels shows this isn't a binary divide but a smooth progression.

Critically, and in continuation from 6.1, this pattern holds across all three geographic scales—Allegheny mirrors Pennsylvania mirrors nationwide. The consistency suggests these aren't problems unique to Pittsburgh's post-industrial transition but reflect systemic American inequities.

Policy Implication: Single-issue interventions won't suffice. A tract struggling with employment also faces education deficits, infrastructure gaps, and housing instability. Effective policy requires coordinated, multi-dimensional approaches that address economic opportunity holistically rather than treating symptoms in isolation.

6.3 Opportunity Gaps Across Key Economic Indicators¶

Description: 2×3 subplot grid showing grouped bar charts for six metrics (Employment Rate, Bachelor's Degree+, Broadband Access, Homeownership Rate, Unemployment Rate, Vacancy Rate), each comparing Low/Moderate vs Middle/Upper income tracts across three geographies.

Objective: Examine each dimension of opportunity independently while maintaining cross-geographic comparability, allowing detailed assessment of where gaps are largest and whether Allegheny County shows unique patterns relative to Pennsylvania and nationwide averages.

Methodology: For each of six key metrics, calculate mean values separately for Low/Moderate income tracts and Middle/Upper income tracts across three geographic scales (Allegheny County, Pennsylvania, Nationwide). Display side-by-side grouped bars showing Low/Moderate (red) and Middle/Upper (blue) performance for all three geographies within each subplot. Maintain consistent color scheme across all six panels to enable pattern recognition. Calculate percentage point gaps (Middle/Upper minus Low/Moderate) for each metric-geography combination. Arrange metrics in 2×3 grid with employment and opportunity indicators in the top row, and housing/stability indicators in the bottom row.

In [12]:
# 6.3 Bar chart comparison of key indicators by income group and geography

import plotly.graph_objects as go
from plotly.subplots import make_subplots

# Metrics to compare and their subplot titles
gap_metrics_subplot = [
    ("Employment Rate", "Employment Rate (%)"),
    ("Bachelors_Plus_Rate", "Bachelor's Degree+ (%)"),
    ("Broadband Rate", "Broadband Access (%)"),
    ("Homeownership Rate", "Homeownership Rate (%)"),
    ("Unemployment Rate", "Unemployment Rate (%)"),
    ("Vacancy Rate", "Vacancy Rate (%)"),
]

# Build comparison data for each metric
subplot_data = {}

for metric, label in gap_metrics_subplot:
    metric_comparison = []

    for geo_name, geo_data in [
        ("Allegheny County", allegheny),
        ("Pennsylvania", pennsylvania),
        ("Nationwide", nationwide),
    ]:
        low_mod = geo_data[geo_data["Distress_Category"] == "Low/Moderate Income"][metric].mean()
        mid_upper = geo_data[geo_data["Distress_Category"] == "Middle/Upper Income"][metric].mean()
        gap = mid_upper - low_mod

        metric_comparison.append(
            {
                "Geography": geo_name,
                "Low/Moderate": low_mod,
                "Middle/Upper": mid_upper,
                "Gap": gap,
            }
        )

    subplot_data[metric] = pd.DataFrame(metric_comparison)

# Create 2x3 grid of subplots
fig = make_subplots(
    rows=2,
    cols=3,
    subplot_titles=[label for _, label in gap_metrics_subplot],
    vertical_spacing=0.15,
    horizontal_spacing=0.10,
)

positions = [(1, 1), (1, 2), (1, 3), (2, 1), (2, 2), (2, 3)]

# Add traces for each metric
for idx, ((metric, label), (row, col)) in enumerate(zip(gap_metrics_subplot, positions)):
    df_metric = subplot_data[metric]

    # Low/Moderate income bars
    fig.add_trace(
        go.Bar(
            name="Low/Moderate Income",
            x=df_metric["Geography"],
            y=df_metric["Low/Moderate"],
            marker_color="#e74c3c",
            text=df_metric["Low/Moderate"].round(1),
            texttemplate="%{text}",
            textposition="outside",
            textfont=dict(size=9),
            showlegend=(idx == 0),
            legendgroup="low_mod",
            hovertemplate="<b>%{x}</b><br>Low/Moderate: %{y:.1f}%<extra></extra>",
        ),
        row=row,
        col=col,
    )

    # Middle/Upper income bars
    fig.add_trace(
        go.Bar(
            name="Middle/Upper Income",
            x=df_metric["Geography"],
            y=df_metric["Middle/Upper"],
            marker_color="#3498db",
            text=df_metric["Middle/Upper"].round(1),
            texttemplate="%{text}",
            textposition="outside",
            textfont=dict(size=9),
            showlegend=(idx == 0),
            legendgroup="mid_upper",
            hovertemplate="<b>%{x}</b><br>Middle/Upper: %{y:.1f}%<extra></extra>",
        ),
        row=row,
        col=col,
    )

    # Axes styling per subplot
    fig.update_yaxes(
        title_text="%",
        title_font=dict(size=10),
        tickfont=dict(size=9),
        gridcolor="#ecf0f1",
        row=row,
        col=col,
    )

    fig.update_xaxes(
        tickfont=dict(size=9),
        tickangle=-45,
        row=row,
        col=col,
    )

# Overall layout
fig.update_layout(
    title=dict(
        text="Opportunity Gaps Across Key Economic Indicators",
        x=0.5,
        xanchor="center",
        font=dict(size=22, family="Arial, sans-serif", color="#2c3e50"),
    ),
    annotations=list(fig.layout.annotations)
    + [
        dict(
            text=(
                "Comparison of Low/Moderate Income vs Middle/Upper Income census tracts "
                "across Allegheny County, Pennsylvania, and the U.S."
            ),
            x=0.5,
            y=-0.08,
            xref="paper",
            yref="paper",
            xanchor="center",
            yanchor="top",
            showarrow=False,
            font=dict(size=11, color="#7f8c8d", family="Arial, sans-serif"),
        )
    ],
    height=800,
    width=1200,
    showlegend=True,
    legend=dict(
        orientation="h",
        yanchor="bottom",
        y=1.05,
        xanchor="center",
        x=0.5,
        font=dict(size=12, family="Arial, sans-serif"),
        bgcolor="rgba(255,255,255,0.8)",
        bordercolor="#bdc3c7",
        borderwidth=1,
    ),
    template="plotly_white",
    barmode="group",
    margin=dict(t=140, b=100, l=60, r=60),
)

fig.show()

Interpretation:

While Allegheny County largely mirrors statewide and national patterns, a closer look reveals where local conditions differ—and where targeted intervention could make the biggest impact.

The education gap is severe everywhere, but Allegheny shows slightly better performance in upper-income tracts (23.3% Bachelor's+) compared to Pennsylvania (20.7%) and nationwide (20.8%). This likely reflects the concentration of universities and knowledge-economy jobs in Pittsburgh. However, low-income tracts still lag dramatically at just 11.2%, creating a local divide that's particularly stark given the region's educational assets.

Where Allegheny stands out most is broadband access. The gap here (6.6 percentage points) is smaller than Pennsylvania's or the nation's, but absolute rates are concerning—only 38% of low-income Allegheny residents have broadband versus 40% in middle/upper areas. In a region marketing itself as a tech hub, this digital divide directly undermines economic inclusion.

Unemployment and homeownership gaps track national averages, but the vacancy rate story is uniquely local. Allegheny's low-income tracts show 15.6% vacancy—higher than Pennsylvania (12.7%) or nationwide (11.9%)—a legacy of deindustrialization and population loss that continues to destabilize neighborhoods.

Policy Implication: Allegheny can't solve national structural inequities alone, but it can address local infrastructure gaps. Expanding broadband in low-income neighborhoods and stabilizing housing markets through anti-blight initiatives would directly target Allegheny's most distinctive challenges.

6.4 Relationship Between Median Income and Employment Rate¶

• Description: Scatter plot with trendline showing relationship between median household income (x-axis) and employment rate (y-axis) for all census tracts, color-coded by geography.

• Objective: Test the correlation between income and employment outcomes, examining whether higher-income areas systematically show stronger labor force participation.

• Methodology: Plot each census tract as a point with opacity to show density. Fit ordinary least squares (OLS) trendline across all tracts. Color points by geography (Allegheny = blue, Pennsylvania = purple, Nationwide = orange). Include hover details for individual tract identification.

In [13]:
# 6.4 Income–employment relationship across geographies

import plotly.express as px
import plotly.graph_objects as go

# Assemble scatter data for all three geographies
scatter_data = []

for geo_name, geo_data in [
    ("Allegheny County", allegheny),
    ("Pennsylvania", pennsylvania),
    ("Nationwide", nationwide),
]:
    temp_df = geo_data[
        [
            "Median Household Income",
            "Employment Rate",
            "FFIEC Tract income level (2022)",
            "Tract Name",
        ]
    ].copy()
    temp_df["Geography"] = geo_name
    scatter_data.append(temp_df)

scatter_df = pd.concat(scatter_data, ignore_index=True)

# Filter out outliers and invalid values
scatter_df = scatter_df[
    (scatter_df["Median Household Income"] > 0)
    & (scatter_df["Median Household Income"] < 250_000)
    & (scatter_df["Employment Rate"] > 0)
    & (scatter_df["Employment Rate"] < 100)
]

# Scatter plot with overall OLS trendline
fig = px.scatter(
    scatter_df,
    x="Median Household Income",
    y="Employment Rate",
    color="Geography",
    color_discrete_map={
        "Allegheny County": "#1f77b4",
        "Pennsylvania": "#9467bd",
        "Nationwide": "#ff7f0e",
    },
    opacity=0.3,
    hover_data={
        "Tract Name": True,
        "FFIEC Tract income level (2022)": True,
        "Median Household Income": ":$,.0f",
        "Employment Rate": ":.1f",
        "Geography": True,
    },
    labels={
        "Median Household Income": "Median Household Income",
        "Employment Rate": "Employment Rate (%)",
    },
    trendline="ols",
    trendline_scope="overall",
    trendline_color_override="#2c3e50",
)

# Emphasize the trendline
for trace in fig.data:
    if getattr(trace, "mode", None) == "lines":
        trace.line.width = 3
        trace.line.dash = "dash"

fig.update_layout(
    title=dict(
        text=(
            "Nation-Level Relationship Between Median Income and Employment Rate"
            "<br><sub>Higher income tracts show stronger employment outcomes</sub>"
        ),
        x=0.5,
        xanchor="center",
        font=dict(size=22, family="Arial, sans-serif", color="#2c3e50"),
    ),
    xaxis=dict(
        title="Median Household Income ($)",
        title_font=dict(size=14, family="Arial, sans-serif"),
        tickformat="$,.0f",
        gridcolor="#ecf0f1",
        tickfont=dict(size=11),
    ),
    yaxis=dict(
        title="Employment Rate (%)",
        title_font=dict(size=14, family="Arial, sans-serif"),
        gridcolor="#ecf0f1",
        range=[70, 100],
        tickfont=dict(size=11),
    ),
    height=600,
    width=1000,
    template="plotly_white",
    plot_bgcolor="white",
    legend=dict(
        title=dict(text="Geography", font=dict(size=13, family="Arial, sans-serif")),
        font=dict(size=12, family="Arial, sans-serif"),
        bgcolor="rgba(255,255,255,0.9)",
        bordercolor="#bdc3c7",
        borderwidth=1,
        x=0.02,
        y=0.98,
        xanchor="left",
        yanchor="top",
    ),
    annotations=[
        dict(
            text="Dashed line shows overall trend across all census tracts",
            x=0.5,
            y=-0.15,
            xref="paper",
            yref="paper",
            xanchor="center",
            yanchor="top",
            showarrow=False,
            font=dict(size=11, color="#7f8c8d", family="Arial, sans-serif"),
        )
    ],
    margin=dict(t=120, b=100, l=80, r=50),
)

fig.show()

Interpretation:

The scatter plot confirms a clear positive relationship: as median household income rises, employment rates climb steadily. The upward trendline shows that for roughly every $50,000 increase in median income, employment rates gain approximately 5-7 percentage points. But the real story is in the spread.

At the lower end—tracts with median incomes below $50,000—employment rates scatter widely from 70% to 95%. This variation is critical: it proves that low income doesn't doom a community to poor employment outcomes. Some disadvantaged tracts achieve employment rates matching or exceeding wealthier areas, suggesting protective factors (strong local employers, workforce programs, transit access) can make a difference.

As income rises above $100,000, the scatter tightens. Nearly all affluent tracts cluster at 95-100% employment with minimal variation, indicating that wealth creates stability and consistent access to jobs. The densest concentration of points sits in the $40,000-$80,000 range at 85-95% employment—this is where most American communities live.

Geographically, Allegheny County (blue), Pennsylvania (purple), and nationwide (orange) tracts blend together seamlessly across the entire income spectrum, reinforcing that these dynamics aren't regionally unique.

Policy Implication: The wide variation among low-income tracts is encouraging—disadvantage isn't destiny. Identifying what differentiates high-performing low-income tracts from struggling ones could reveal replicable interventions that break the income-employment correlation.

6.5 Educational Attainment by Income Level¶

• Description: Three side-by-side stacked bar charts (one per geography) showing distribution of educational attainment (High School or Less, Some College, Associate Degree, Bachelor's+) across four income levels.

• Objective: Identify where the education gap is most pronounced and whether it's concentrated at specific credential levels or distributed across the full education spectrum.

• Methodology: Calculate percentage of population in each education category by income level and geography. Stack categories in consistent color order (red = less education → blue = more education). Display percentages within bars.

In [14]:
# 6.5 Educational attainment by income level and geography

import plotly.graph_objects as go
from plotly.subplots import make_subplots

education_data = []

# Build approximate education breakdown by income level
for geo_name, geo_data in [
    ("Allegheny County", allegheny),
    ("Pennsylvania", pennsylvania),
    ("Nationwide", nationwide),
]:
    for income_level in ["Low", "Moderate", "Middle", "Upper"]:
        subset = geo_data[geo_data["FFIEC Tract income level (2022)"] == income_level]

        if len(subset) == 0:
            continue

        # Bachelor's+ share
        bachelors_plus = subset["Bachelors_Plus_Rate"].mean()

        # Associate degree share relative to population
        associates = (subset["Associates Degree"] / subset["Total Population"] * 100).mean()

        # Approximate "Some College"
        some_college_estimate = associates * 1.5

        # Remainder = high school or less
        hs_or_less = 100 - bachelors_plus - associates - some_college_estimate

        education_data.append(
            {
                "Geography": geo_name,
                "Income Level": income_level,
                "High School or Less": max(0, hs_or_less),
                "Some College": some_college_estimate,
                "Associate Degree": associates,
                "Bachelor's+": bachelors_plus,
            }
        )

education_df = pd.DataFrame(education_data)

# Stacked bars for each geography
fig = make_subplots(
    rows=1,
    cols=3,
    subplot_titles=["Allegheny County", "Pennsylvania", "Nationwide"],
    horizontal_spacing=0.08,
)

colors = {
    "High School or Less": "#e74c3c",
    "Some College": "#e67e22",
    "Associate Degree": "#f39c12",
    "Bachelor's+": "#3498db",
}

education_categories = [
    "High School or Less",
    "Some College",
    "Associate Degree",
    "Bachelor's+",
]

for col_idx, geo_name in enumerate(["Allegheny County", "Pennsylvania", "Nationwide"], 1):
    geo_subset = education_df[education_df["Geography"] == geo_name]

    for edu_cat in education_categories:
        fig.add_trace(
            go.Bar(
                name=edu_cat,
                x=geo_subset["Income Level"],
                y=geo_subset[edu_cat],
                marker_color=colors[edu_cat],
                showlegend=(col_idx == 1),
                legendgroup=edu_cat,
                text=geo_subset[edu_cat].round(1),
                texttemplate="%{text:.0f}%",
                textposition="inside",
                textfont=dict(size=9, color="white"),
                hovertemplate=(
                    "<b>%{x} Income</b><br>"
                    + edu_cat
                    + ": %{y:.1f}%<extra></extra>"
                ),
            ),
            row=1,
            col=col_idx,
        )

    fig.update_xaxes(
        categoryorder="array",
        categoryarray=["Low", "Moderate", "Middle", "Upper"],
        tickfont=dict(size=11),
        row=1,
        col=col_idx,
    )

    fig.update_yaxes(
        title_text="Percentage (%)" if col_idx == 1 else "",
        title_font=dict(size=12),
        tickfont=dict(size=10),
        range=[0, 100],
        gridcolor="#ecf0f1",
        row=1,
        col=col_idx,
    )

fig.update_layout(
    title=dict(
        text="Educational Attainment by Income Level",
        x=0.5,
        xanchor="center",
        font=dict(size=22, family="Arial, sans-serif", color="#2c3e50"),
    ),
    barmode="stack",
    height=550,
    width=1300,
    template="plotly_white",
    legend=dict(
        title=dict(text="Education Level", font=dict(size=13)),
        font=dict(size=12, family="Arial, sans-serif"),
        orientation="h",
        yanchor="bottom",
        y=1.1,
        xanchor="center",
        x=0.5,
        bgcolor="rgba(255,255,255,0.9)",
        bordercolor="#bdc3c7",
        borderwidth=1,
    ),
    annotations=list(fig.layout.annotations)
    + [
        dict(
            text=(
                "Lower-income tracts show higher shares of red/orange (lower attainment), "
                "while upper-income tracts show more blue (Bachelor's+)."
            ),
            x=0.5,
            y=-0.12,
            xref="paper",
            yref="paper",
            xanchor="center",
            yanchor="top",
            showarrow=False,
            font=dict(size=11, color="#7f8c8d"),
        )
    ],
    margin=dict(t=150, b=100, l=70, r=50),
)

fig.show()

print("✓ Education attainment visualization created!")
✓ Education attainment visualization created!

Interpretation:

The education divide isn't gradual—it's a cliff. In low-income tracts across all three geographies, 77-83% of residents have a high school education or less (red dominance). This flips entirely in upper-income tracts where Bachelor's degrees or higher (blue) account for 21-23% of the population, while high school-or-less drops to 62-65%.

What's striking is where the gap isn't. "Some College" (yellow/orange) and "Associate Degree" (orange) categories show remarkably little variation across income levels—hovering around 8-13% regardless of tract wealth. This suggests that starting college or earning an Associate's degree doesn't strongly predict economic mobility. The critical threshold is the Bachelor's degree.

Allegheny County shows a slightly sharper education gradient than Pennsylvania or nationwide. Upper-income Allegheny tracts have 23% Bachelor's+ attainment compared to 21% statewide and nationally, likely reflecting Pittsburgh's concentration of universities and professional employers. But low-income Allegheny tracts mirror the national pattern at 77% high school or less—the region's educational assets aren't reaching disadvantaged communities.

The middle and moderate-income categories (center two bars) show intermediate patterns, confirming this is a spectrum rather than a binary. But the Low-to-Upper contrast is stark: you're looking at nearly inverted educational profiles.

Policy Implication: Associate's degrees and "some college" aren't closing opportunity gaps. If education policy aims to improve economic mobility, the focus must be on Bachelor's degree completion—not just college access.

6.6 Infrastructure Access Gaps in Allegheny County¶

• Description: Grouped bar chart comparing Broadband Access and Public Transit Use across four income levels, focused exclusively on Allegheny County tracts.

• Objective: Examine two critical infrastructure dimensions—digital access (broadband) and physical mobility (transit)—to assess whether infrastructure gaps correlate with income classification.

• Methodology: Calculate mean Broadband Access Rate and Transit Commute Rate for each income level within Allegheny County. Display as grouped bars with Broadband (blue) and Transit (orange) side-by-side for each income category.

In [15]:
# 6.6 Infrastructure access gaps within Allegheny County

import plotly.graph_objects as go

# Aggregate broadband and transit use by income level (Allegheny only)
infrastructure_data = []

for income_level in ["Low", "Moderate", "Middle", "Upper"]:
    subset = allegheny[allegheny["FFIEC Tract income level (2022)"] == income_level]

    if len(subset) == 0:
        continue

    infrastructure_data.append(
        {
            "Income Level": income_level,
            "Broadband Access": subset["Broadband Rate"].mean(),
            "Public Transit Use": subset["Transit Rate"].mean(),
            "n_tracts": len(subset),
        }
    )

infra_df = pd.DataFrame(infrastructure_data)

# Grouped bar chart: broadband vs transit by income level
fig = go.Figure()

fig.add_trace(
    go.Bar(
        name="Broadband Access",
        x=infra_df["Income Level"],
        y=infra_df["Broadband Access"],
        marker_color="#3498db",
        text=infra_df["Broadband Access"].round(1).astype(str) + "%",
        textposition="outside",
        textfont=dict(size=11),
        hovertemplate=(
            "<b>%{x} Income</b><br>"
            "Broadband Access: %{y:.1f}%<extra></extra>"
        ),
    )
)

fig.add_trace(
    go.Bar(
        name="Public Transit Use",
        x=infra_df["Income Level"],
        y=infra_df["Public Transit Use"],
        marker_color="#e67e22",
        text=infra_df["Public Transit Use"].round(1).astype(str) + "%",
        textposition="outside",
        textfont=dict(size=11),
        hovertemplate=(
            "<b>%{x} Income</b><br>"
            "Public Transit Use: %{y:.1f}%<extra></extra>"
        ),
    )
)

fig.update_layout(
    title=dict(
        text="Infrastructure Access Gaps in Allegheny County",
        x=0.5,
        xanchor="center",
        font=dict(size=22, family="Arial, sans-serif", color="#2c3e50"),
    ),
    xaxis=dict(
        title="Income Level",
        title_font=dict(size=14),
        categoryorder="array",
        categoryarray=["Low", "Moderate", "Middle", "Upper"],
        tickfont=dict(size=12),
    ),
    yaxis=dict(
        title="Percentage (%)",
        title_font=dict(size=14),
        tickfont=dict(size=11),
        gridcolor="#ecf0f1",
        range=[
            0,
            max(
                infra_df["Broadband Access"].max(),
                infra_df["Public Transit Use"].max(),
            )
            + 10,
        ],
    ),
    barmode="group",
    height=550,
    width=900,
    template="plotly_white",
    plot_bgcolor="white",
    legend=dict(
        title=dict(text="Infrastructure Type", font=dict(size=13)),
        font=dict(size=12, family="Arial, sans-serif"),
        orientation="h",
        yanchor="bottom",
        y=1.08,
        xanchor="center",
        x=0.5,
        bgcolor="rgba(255,255,255,0.9)",
        bordercolor="#bdc3c7",
        borderwidth=1,
    ),
    annotations=[
        dict(
            text=(
                "Digital access rises with income, while transit use is highest "
                "in lower-income areas."
            ),
            x=0.5,
            y=-0.15,
            xref="paper",
            yref="paper",
            xanchor="center",
            yanchor="top",
            showarrow=False,
            font=dict(size=11, color="#7f8c8d", family="Arial, sans-serif"),
        )
    ],
    margin=dict(t=130, b=100, l=80, r=50),
)

fig.show()

# Simple numeric summary for the narrative
print("Infrastructure gap summary (Allegheny County):")
broadband_low = infra_df.loc[infra_df["Income Level"] == "Low", "Broadband Access"].values[0]
broadband_upper = infra_df.loc[infra_df["Income Level"] == "Upper", "Broadband Access"].values[0]
transit_low = infra_df.loc[infra_df["Income Level"] == "Low", "Public Transit Use"].values[0]
transit_upper = infra_df.loc[infra_df["Income Level"] == "Upper", "Public Transit Use"].values[0]

print(f"- Broadband gap (Upper - Low): {broadband_upper - broadband_low:.1f} percentage points")
print(f"- Transit use: higher in {'lower' if transit_low > transit_upper else 'upper'}-income areas")
Infrastructure gap summary (Allegheny County):
- Broadband gap (Upper - Low): 5.8 percentage points
- Transit use: higher in lower-income areas

Interpretation:

Allegheny County faces a clear digital divide: only 33.6% of low-income residents have broadband access compared to 39.3% in upper-income tracts—a gap of nearly 6 percentage points. While this might seem modest, it translates to thousands of households locked out of remote work, telehealth, online education, and essential digital services. Even the highest rate (41.7% in middle-income tracts) is alarmingly low, suggesting broadband infrastructure lags across the county.

Transit usage tells the opposite story, revealing dependency rather than access. Low-income tracts show 7.6% transit commuting—nearly triple the 2.6% rate in upper-income areas. This pattern reflects necessity, not preference: residents in disadvantaged neighborhoods rely on public transit because they lack vehicle access, while affluent residents drive. The drop from 7.6% (low) to 5.3% (moderate) to 3.6% (middle) to 2.6% (upper) shows a clear inverse relationship between income and transit dependency.

The combination is problematic. Low-income residents depend heavily on transit for basic mobility but lack the broadband access needed for flexible work arrangements or digital job applications. This creates compounding barriers—if you can't work remotely due to poor internet, you're forced into transit-dependent jobs, limiting employment options to routes the bus serves.

Policy Implication: Infrastructure investments should be bundled. Expanding broadband in low-income neighborhoods while simultaneously improving transit frequency and coverage would address complementary mobility barriers. One without the other leaves gaps unfilled.

6.7 Income Classification by Census Tract - Allegheny County¶

• Description: Interactive choropleth map of Allegheny County showing FFIEC income classification (Low, Moderate, Middle, Upper) for each census tract with census boundaries visible.

• Objective: Visualize the geographic distribution of income levels to identify spatial clustering patterns and test whether low-income tracts concentrate in specific areas of the county.

• Methodology: Merge census tract shapefiles with income classification data using 11-digit GEOID. Color-code tracts by income level (Red = Low, Orange = Moderate, Yellow = Middle, Blue = Upper). Enable hover tooltips showing tract details.

In [16]:
# 6.7 Choropleth: income classification by census tract (Allegheny County)

import geopandas as gpd
import plotly.express as px
import plotly.graph_objects as go

# Path to Pennsylvania tract shapefile (2022)
shapefile_path = "/Users/sofiahutton/Documents/Fall 2025 CMU Classes/visualizations with python /tl_2022_42_tract.shp"

# Load tracts and filter to Allegheny County (county FIPS = 003)
gdf = gpd.read_file(shapefile_path)
gdf_allegheny = gdf[gdf["COUNTYFP"] == "003"].copy()

# Build GEOID for merge
gdf_allegheny["GEOID_full"] = gdf_allegheny["GEOID"]

allegheny_for_map = allegheny.copy()
allegheny_for_map["GEOID_full"] = (
    allegheny_for_map["state"].astype(str).str.zfill(2)
    + allegheny_for_map["county"].astype(str).str.zfill(3)
    + allegheny_for_map["Tract Code (6-digit)"].astype(str).str.zfill(6)
)

# Merge shapefile with tract-level indicators
gdf_merged = gdf_allegheny.merge(
    allegheny_for_map[
        [
            "GEOID_full",
            "FFIEC Tract income level (2022)",
            "Tract Name",
            "Employment Rate",
            "Median Household Income",
            "Bachelors_Plus_Rate",
        ]
    ],
    on="GEOID_full",
    how="left",
)

# Income-level colors
color_map = {
    "Low": "#e74c3c",
    "Moderate": "#e67e22",
    "Middle": "#f39c12",
    "Upper": "#3498db",
    "Unknown": "#95a5a6",
}

# Fill missing classifications
gdf_merged["FFIEC Tract income level (2022)"] = gdf_merged[
    "FFIEC Tract income level (2022)"
].fillna("Unknown")

# Center map around tract centroids
center_lat = gdf_merged.geometry.centroid.y.mean()
center_lon = gdf_merged.geometry.centroid.x.mean()

fig = px.choropleth_mapbox(
    gdf_merged,
    geojson=gdf_merged.geometry,
    locations=gdf_merged.index,
    color="FFIEC Tract income level (2022)",
    color_discrete_map=color_map,
    category_orders={
        "FFIEC Tract income level (2022)": ["Low", "Moderate", "Middle", "Upper", "Unknown"]
    },
    mapbox_style="carto-positron",
    center={"lat": center_lat, "lon": center_lon},
    zoom=9,
    opacity=0.7,
    hover_data={
        "Tract Name": True,
        "FFIEC Tract income level (2022)": True,
        "Employment Rate": ":.1f",
        "Median Household Income": ":$,.0f",
        "Bachelors_Plus_Rate": ":.1f",
    },
    labels={
        "FFIEC Tract income level (2022)": "Income Level",
        "Employment Rate": "Employment Rate (%)",
        "Median Household Income": "Median Income",
        "Bachelors_Plus_Rate": "Bachelor's Degree+ (%)",
    },
)

fig.update_layout(
    title=dict(
        text="Income Classification by Census Tract<br><sub>Allegheny County, Pennsylvania</sub>",
        x=0.5,
        xanchor="center",
        font=dict(size=24, family="Arial, sans-serif", color="#2c3e50"),
    ),
    margin=dict(r=0, t=80, l=0, b=0),
    height=700,
    legend=dict(
        title=dict(text="FFIEC Income Level", font=dict(size=14)),
        font=dict(size=12, family="Arial, sans-serif"),
        bgcolor="rgba(255,255,255,0.9)",
        bordercolor="#bdc3c7",
        borderwidth=2,
        x=0.02,
        y=0.98,
        xanchor="left",
        yanchor="top",
    ),
)

fig.show()

Interpretation:

Allegheny County's economic geography shows clear spatial patterns. Low-income tracts (red) and moderate-income tracts (orange) concentrate heavily in the urban core and inner-ring areas, while upper-income tracts (blue) dominate the northern portions of the county and select areas to the south and east. Middle-income tracts (yellow) form transition zones between these extremes.

The clustering is unmistakable—disadvantage is not randomly distributed across the county. Low and moderate-income areas group together in connected geographic clusters, particularly visible in the central and eastern portions of the map. Similarly, upper-income tracts form their own contiguous zones, especially prominent in the northern third of the county.

This spatial segregation means that residents of low-income neighborhoods don't just face individual economic challenges—they're surrounded by communities facing similar barriers. Conversely, upper-income areas benefit from geographic concentration of resources, quality schools, commercial investment, and accumulated wealth.

The moderate and middle-income tracts (orange and yellow) create buffer zones but don't eliminate the fundamental pattern: Allegheny County is economically segregated by geography.

Policy Implication: The clustering of disadvantage suggests place-based interventions could be highly efficient. Rather than scattering resources across the entire county, targeted investment in these visible low-income clusters could reach large populations facing similar challenges. However, it also raises questions about whether strategies should focus on neighborhood revitalization or economic mobility that helps residents access opportunities elsewhere.

6.8 Compound Disadvantage Hot Spots in Allegheny County¶

  • Description: Choropleth map of Allegheny County with census tracts color-coded by quartile ranking on a composite disadvantage score that combines six economic indicators. Unlike the income classification map, which categorizes tracts by median income alone, this visualization measures actual multi-dimensional outcomes to identify where disadvantages compound across employment, education, housing, and infrastructure simultaneously.

  • Objective: Identify specific geographic areas facing compounding disadvantages across multiple dimensions, pinpointing priority zones for comprehensive policy intervention. By moving beyond single metrics or income categories, this map reveals which neighborhoods face the most severe cumulative challenges and therefore require the most urgent, holistic support.

  • Methodology: Create composite disadvantage score by normalizing six metrics (unemployment rate, vacancy rate, low employment rate, low education rate, low broadband access, low homeownership rate) to a 0-1 scale where higher values indicate worse outcomes. Average the six normalized scores to produce a single compound disadvantage index for each tract. Divide all Allegheny County tracts into quartiles based on this composite score. Map using discrete color scale with four categories: Green = lowest disadvantage quartile (best 25%), Yellow = lower-middle quartile, Orange = upper-middle quartile, Dark Red = highest disadvantage quartile (worst 25%). This quartile-based approach ensures maximum visual contrast and clearly identifies priority intervention zones regardless of absolute score values.

In [17]:
# 6.8 Compound disadvantage hot spots in Allegheny County

import geopandas as gpd
import plotly.express as px
import numpy as np

# Load Pennsylvania tract shapefile and filter to Allegheny County
shapefile_path = "/Users/sofiahutton/Documents/Fall 2025 CMU Classes/visualizations with python /tl_2022_42_tract.shp"

gdf = gpd.read_file(shapefile_path)
gdf_allegheny = gdf[gdf["COUNTYFP"] == "003"].copy()

# Create compound disadvantage score for Allegheny tracts
allegheny_hotspot = allegheny.copy()

# Normalize component metrics to a 0–1 scale where higher = worse outcome
allegheny_hotspot["unemployment_score"] = allegheny_hotspot["Unemployment Rate"] / 100
allegheny_hotspot["vacancy_score"] = allegheny_hotspot["Vacancy Rate"] / 100
allegheny_hotspot["low_employment_score"] = (100 - allegheny_hotspot["Employment Rate"]) / 100
allegheny_hotspot["low_education_score"] = (100 - allegheny_hotspot["Bachelors_Plus_Rate"]) / 100
allegheny_hotspot["low_broadband_score"] = (100 - allegheny_hotspot["Broadband Rate"]) / 100
allegheny_hotspot["low_homeownership_score"] = (100 - allegheny_hotspot["Homeownership Rate"]) / 100

disadvantage_components = [
    "unemployment_score",
    "vacancy_score",
    "low_employment_score",
    "low_education_score",
    "low_broadband_score",
    "low_homeownership_score",
]

# Average across components and rescale to 0–100
allegheny_hotspot["Disadvantage_Score"] = allegheny_hotspot[disadvantage_components].mean(axis=1) * 100

# Distribution check
print("Disadvantage score distribution (Allegheny tracts):")
print(f"- Min:              {allegheny_hotspot['Disadvantage_Score'].min():.1f}")
print(f"- 25th percentile:  {allegheny_hotspot['Disadvantage_Score'].quantile(0.25):.1f}")
print(f"- Median:           {allegheny_hotspot['Disadvantage_Score'].median():.1f}")
print(f"- 75th percentile:  {allegheny_hotspot['Disadvantage_Score'].quantile(0.75):.1f}")
print(f"- Max:              {allegheny_hotspot['Disadvantage_Score'].max():.1f}")

# Quartile categories for clearer visual breakpoints
allegheny_hotspot["Disadvantage_Quartile"] = pd.qcut(
    allegheny_hotspot["Disadvantage_Score"],
    q=4,
    labels=["Lowest 25%", "Lower-Middle 25%", "Upper-Middle 25%", "Highest 25%"],
)

print("\nQuartile counts:")
print(allegheny_hotspot["Disadvantage_Quartile"].value_counts().sort_index())

# Merge with Allegheny tracts shapefile
gdf_allegheny["GEOID_full"] = gdf_allegheny["GEOID"]
allegheny_hotspot["GEOID_full"] = (
    allegheny_hotspot["state"].astype(str).str.zfill(2)
    + allegheny_hotspot["county"].astype(str).str.zfill(3)
    + allegheny_hotspot["Tract Code (6-digit)"].astype(str).str.zfill(6)
)

gdf_hotspot = gdf_allegheny.merge(
    allegheny_hotspot[
        [
            "GEOID_full",
            "Disadvantage_Score",
            "Disadvantage_Quartile",
            "Tract Name",
            "Employment Rate",
            "Unemployment Rate",
            "Bachelors_Plus_Rate",
            "Broadband Rate",
            "Homeownership Rate",
            "FFIEC Tract income level (2022)",
        ]
    ],
    on="GEOID_full",
    how="left",
)

# Discrete color scale by quartile (green = best, dark red = worst)
color_map_discrete = {
    "Lowest 25%": "#2ecc71",        # Green (least disadvantaged)
    "Lower-Middle 25%": "#f39c12",  # Yellow
    "Upper-Middle 25%": "#e67e22",  # Orange
    "Highest 25%": "#c0392b",       # Dark red (most disadvantaged)
}

fig = px.choropleth_mapbox(
    gdf_hotspot,
    geojson=gdf_hotspot.geometry,
    locations=gdf_hotspot.index,
    color="Disadvantage_Quartile",
    color_discrete_map=color_map_discrete,
    category_orders={
        "Disadvantage_Quartile": [
            "Lowest 25%",
            "Lower-Middle 25%",
            "Upper-Middle 25%",
            "Highest 25%",
        ]
    },
    mapbox_style="carto-positron",
    center={
        "lat": gdf_hotspot.geometry.centroid.y.mean(),
        "lon": gdf_hotspot.geometry.centroid.x.mean(),
    },
    zoom=9,
    opacity=0.8,
    hover_data={
        "Tract Name": True,
        "Disadvantage_Score": ":.1f",
        "Disadvantage_Quartile": True,
        "FFIEC Tract income level (2022)": True,
        "Employment Rate": ":.1f",
        "Unemployment Rate": ":.1f",
        "Bachelors_Plus_Rate": ":.1f",
        "Broadband Rate": ":.1f",
        "Homeownership Rate": ":.1f",
    },
    labels={
        "Disadvantage_Quartile": "Disadvantage Level",
        "Disadvantage_Score": "Composite Score",
        "FFIEC Tract income level (2022)": "Income Level",
        "Employment Rate": "Employment (%)",
        "Unemployment Rate": "Unemployment (%)",
        "Bachelors_Plus_Rate": "Bachelor's+ (%)",
        "Broadband Rate": "Broadband (%)",
        "Homeownership Rate": "Homeownership (%)",
    },
)

fig.update_layout(
    title=dict(
        text=(
            "Compound Disadvantage Hot Spots in Allegheny County"
            "<br><sub>Dark red = top 25% most disadvantaged tracts (multiple overlapping challenges)</sub>"
        ),
        x=0.5,
        xanchor="center",
        font=dict(size=22, family="Arial, sans-serif", color="#2c3e50"),
    ),
    margin=dict(r=0, t=100, l=0, b=0),
    height=700,
    legend=dict(
        title=dict(text="Disadvantage Level", font=dict(size=13)),
        font=dict(size=12, family="Arial, sans-serif"),
        bgcolor="rgba(255,255,255,0.95)",
        bordercolor="#bdc3c7",
        borderwidth=2,
        x=0.02,
        y=0.98,
        xanchor="left",
        yanchor="top",
    ),
)

fig.show()

print("\nHot spot map generated: dark red tracts represent the highest composite disadvantage; green tracts the lowest.")
Disadvantage score distribution (Allegheny tracts):
- Min:              18.4
- 25th percentile:  29.0
- Median:           32.4
- 75th percentile:  36.7
- Max:              100.0

Quartile counts:
Disadvantage_Quartile
Lowest 25%          97
Lower-Middle 25%    96
Upper-Middle 25%    96
Highest 25%         97
Name: count, dtype: int64
Hot spot map generated: dark red tracts represent the highest composite disadvantage; green tracts the lowest.

Interpretation:

Building on the income classification patterns from the previous map, this hot spots visualization reveals something more precise: it's not just income level that matters, but the intensity of compounding challenges. While the income map showed where low-income tracts cluster, this map pinpoints which of those areas face the most severe multi-dimensional disadvantage.

The highest 25% most disadvantaged tracts (dark red) concentrate in overlapping zones with the low/moderate income clusters identified earlier—particularly in the central urban core. But the compound disadvantage score adds nuance: not all low-income areas face equal challenges. Some moderate-income tracts (orange/yellow on the previous map) show up as orange or even red here, indicating they struggle across employment, education, and infrastructure despite not being classified as "low income" by FFIEC standards.

Conversely, the green zones (lowest disadvantage) align almost perfectly with the blue upper-income corridors in the north, confirming that affluent areas don't just have higher incomes—they excel across all measured dimensions simultaneously.

The quartile approach reveals transition zones that weren't obvious in simple income classification. Some areas in the yellow/orange range face moderate compound disadvantage—they're not crisis zones, but they're vulnerable and could tip either direction depending on policy intervention or disinvestment.

Policy Implication: While income classification helps target resources, this compound disadvantage score identifies which low-income areas need the most urgent, comprehensive intervention. The dark red hot spots should be first-priority zones for bundled strategies addressing employment, education, broadband, and housing stability simultaneously.

6.9 "Success Stories" - Low/Moderate Income Tracts Beating the Odds¶

• Description: This visualization identifies “success story” census tracts within Allegheny County—low- and moderate-income neighborhoods that achieve stronger employment or education outcomes than would typically be expected for their income tier. By shifting attention from deficits to positive outliers, the visualization highlights communities that demonstrate resilience or upward mobility despite structural disadvantage.

• Objective: The goal is to understand which disadvantaged tracts outperform the county’s baseline and why. Highlighting these tracts provides a foundation for exploring the local conditions, assets, or interventions that may contribute to stronger-than-expected outcomes. This perspective helps inform place-based strategies by showing where economic opportunity appears to be taking hold, and what lessons might be transferable to other communities facing similar constraints.

• Methodology: The analysis compares employment and educational outcomes in low/moderate-income tracts directly to the average outcomes of middle/upper-income tracts in Allegheny County.

- Tracts with employment rates at or above the middle/upper-income average are flagged as strong employment performers.

- Tracts with bachelor’s-degree attainment above the same benchmark are identified as strong education performers.

Combining these criteria yields a four-category performance typology:

- Excelling, Strong Employment, Strong Education, and Struggling.

This simple comparative framework allows disadvantaged tracts to be evaluated against a realistic performance target rooted in local economic conditions, making it easier to identify neighborhoods that are truly exceeding expectations.

In [18]:
# 6.9 "Success story" tracts: low/moderate-income areas with strong outcomes

import plotly.express as px
import plotly.graph_objects as go

# Start from low/moderate-income tracts only
allegheny_success = allegheny[allegheny["Distress_Category"] == "Low/Moderate Income"].copy()

# Benchmark employment for low/moderate tracts
low_mod_median_employment = allegheny_success["Employment Rate"].median()
low_mod_75th_employment = allegheny_success["Employment Rate"].quantile(0.75)

# Middle/upper-income benchmark
mid_upper_avg_employment = allegheny[
    allegheny["Distress_Category"] == "Middle/Upper Income"
]["Employment Rate"].mean()

print("Employment benchmarks (Allegheny County):")
print("Low/Moderate income tracts:")
print(f"- Median employment rate:     {low_mod_median_employment:.1f}%")
print(f"- 75th percentile:           {low_mod_75th_employment:.1f}%")
print("\nMiddle/Upper income tracts:")
print(f"- Average employment rate:   {mid_upper_avg_employment:.1f}%")

# Flag low/moderate tracts that match or exceed middle/upper employment
allegheny_success["Success_Story"] = (
    allegheny_success["Employment Rate"] >= mid_upper_avg_employment
)

# Benchmark for education
mid_upper_avg_education = allegheny[
    allegheny["Distress_Category"] == "Middle/Upper Income"
]["Bachelors_Plus_Rate"].mean()
allegheny_success["High_Education"] = (
    allegheny_success["Bachelors_Plus_Rate"] >= mid_upper_avg_education
)

# Categorize performance within low/moderate tracts
def categorize_success(row):
    if row["Success_Story"] and row["High_Education"]:
        return "Excelling (High Employment + Education)"
    elif row["Success_Story"]:
        return "Strong Employment"
    elif row["High_Education"]:
        return "Strong Education"
    else:
        return "Struggling"

allegheny_success["Performance_Category"] = allegheny_success.apply(
    categorize_success, axis=1
)

print("\nPerformance category counts (Low/Moderate income tracts):")
print(allegheny_success["Performance_Category"].value_counts())

# Scatter plot: success profiles among low/moderate tracts
fig = px.scatter(
    allegheny_success,
    x="Median Household Income",
    y="Employment Rate",
    size="Bachelors_Plus_Rate",
    color="Performance_Category",
    color_discrete_map={
        "Excelling (High Employment + Education)": "#27ae60",  # Green
        "Strong Employment": "#3498db",                        # Blue
        "Strong Education": "#9b59b6",                         # Purple
        "Struggling": "#e74c3c",                               # Red
    },
    hover_data={
        "Tract Name": True,
        "FFIEC Tract income level (2022)": True,
        "Employment Rate": ":.1f",
        "Bachelors_Plus_Rate": ":.1f",
        "Median Household Income": ":$,.0f",
        "Broadband Rate": ":.1f",
        "Homeownership Rate": ":.1f",
    },
    labels={
        "Employment Rate": "Employment Rate (%)",
        "Median Household Income": "Median Household Income",
        "Bachelors_Plus_Rate": "Bachelor's+ (%)",
        "Performance_Category": "Performance",
    },
    size_max=20,
)

# Reference line: middle/upper-income average employment
fig.add_hline(
    y=mid_upper_avg_employment,
    line_dash="dash",
    line_color="#95a5a6",
    annotation_text=f"Middle/Upper avg employment ({mid_upper_avg_employment:.1f}%)",
    annotation_position="right",
)

fig.update_layout(
    title=dict(
        text=(
            '"Success Stories": Low/Moderate Income Tracts with Strong Outcomes'
            "<br><sub>Green/blue points mark tracts matching or exceeding middle/upper-income benchmarks</sub>"
        ),
        x=0.5,
        xanchor="center",
        font=dict(size=20, family="Arial, sans-serif", color="#2c3e50"),
    ),
    xaxis=dict(
        title="Median Household Income ($)",
        title_font=dict(size=14),
        tickformat="$,.0f",
        gridcolor="#ecf0f1",
        tickfont=dict(size=11),
    ),
    yaxis=dict(
        title="Employment Rate (%)",
        title_font=dict(size=14),
        gridcolor="#ecf0f1",
        tickfont=dict(size=11),
        range=[75, 100],
    ),
    height=650,
    width=1100,
    template="plotly_white",
    plot_bgcolor="white",
    legend=dict(
        title=dict(text="Tract Performance", font=dict(size=13)),
        font=dict(size=11, family="Arial, sans-serif"),
        bgcolor="rgba(255,255,255,0.95)",
        bordercolor="#bdc3c7",
        borderwidth=1,
        x=0.02,
        y=0.98,
        xanchor="left",
        yanchor="top",
    ),
    annotations=list(fig.layout.annotations)
    + [
        dict(
            text=(
                "Bubble size = Bachelor's+ share. "
                "Dashed line shows the middle/upper-income average employment rate."
            ),
            x=0.5,
            y=-0.12,
            xref="paper",
            yref="paper",
            xanchor="center",
            yanchor="top",
            showarrow=False,
            font=dict(size=11, color="#7f8c8d", family="Arial, sans-serif"),
        )
    ],
    margin=dict(t=130, b=100, l=80, r=50),
)

fig.show()

# Simple numeric summary
n_success = allegheny_success["Success_Story"].sum()
n_total = len(allegheny_success)

print(f"\nSummary: {n_success} out of {n_total} low/moderate-income tracts")
print("have employment rates at or above the middle/upper-income average.")
print("These tracts may offer useful lessons about local conditions and interventions.")
Employment benchmarks (Allegheny County):
Low/Moderate income tracts:
- Median employment rate:     92.3%
- 75th percentile:           95.1%

Middle/Upper income tracts:
- Average employment rate:   95.8%

Performance category counts (Low/Moderate income tracts):
Performance_Category
Struggling                                 97
Strong Employment                          16
Strong Education                            4
Excelling (High Employment + Education)     4
Name: count, dtype: int64
Summary: 20 out of 121 low/moderate-income tracts
have employment rates at or above the middle/upper-income average.
These tracts may offer useful lessons about local conditions and interventions.

Interpretation:

Out of 121 low/moderate income tracts in Allegheny County, 20 achieve employment rates matching or exceeding the middle/upper income average of 95.8%—proof that disadvantage doesn't predetermine failure. These "success story" tracts (blue and green points above the reference line) scatter across different income levels and geographies, suggesting their success stems from specific local factors rather than broader regional trends.

The scatter reveals important patterns. The 16 tracts with "Strong Employment" (blue) achieve high employment despite modest education levels (smaller bubbles), indicating they've likely secured access to stable jobs that don't require Bachelor's degrees—possibly through proximity to specific employers, strong vocational training, or industry clusters. The 4 "Excelling" tracts (green, larger bubbles) combine both high employment and high education, suggesting the presence of universities, hospitals, or other anchor institutions creating local opportunity.

However, the majority—97 tracts—remain in the "Struggling" category (red), clustering below the 95% employment benchmark. Many sit in the 85-92% range, showing employment rates 5-10 percentage points below what middle/upper income areas achieve. This isn't catastrophic unemployment, but it represents persistent exclusion from full labor market participation.

The wide income range among struggling tracts ($15,000-$55,000 median) shows that simply raising incomes slightly doesn't automatically improve employment outcomes. Something else differentiates the success stories.

Policy Implication: Rather than only studying why disadvantaged areas fail, policymakers should investigate these 20 outperforming tracts. What do they have? Better transit access? Nearby employers hiring locally? Community colleges or workforce programs? Strong civic organizations? Identifying and replicating those protective factors could be more effective than generic anti-poverty programs.


7. Key Findings¶

This analysis examined 84,415 census tracts nationwide—with particular focus on Allegheny County's 402 tracts—to test whether systematic opportunity gaps exist across income levels. The evidence confirms all five initial hypotheses and reveals patterns with direct policy implications.

Finding 1: Unemployment Gaps Are Severe and Persistent¶

Low/moderate income tracts face unemployment rates 39-50 percentage points higher than middle/upper income areas (Visualization 6.1). In practical terms, this means disadvantaged neighborhoods experience roughly 8-11% unemployment while affluent areas hover around 3-5%. This isn't marginal exclusion—it's systematic labor market failure affecting thousands of working-age adults. The gap holds across Allegheny County, Pennsylvania, and nationwide, indicating structural rather than local causes.

Finding 2: The Education Divide Is a Bachelor's Degree Cliff¶

Educational attainment shows the widest disparities of any metric measured. Middle/upper income tracts have 72-84 percentage points more Bachelor's degree holders than low-income areas (Visualization 6.1, 6.5). Critically, the gap concentrates at the four-year degree level—Associate's degrees and "some college" show minimal variation across income groups. This suggests credential inflation or that different post-secondary pathways have vastly different labor market returns. Upper-income tracts in Allegheny show 23% Bachelor's+ attainment versus just 11% in low-income neighborhoods, creating a knowledge economy accessible primarily to those already privileged.

Finding 3: Disadvantage Is Multi-Dimensional, Not Isolated¶

The heatmap scorecard (Visualization 6.2) proves that struggling tracts don't just face one or two challenges—they score poorly across employment, education, income, broadband, homeownership, and vacancy simultaneously. The uniformity of blue coloring (poor outcomes) in low-income rows versus red coloring (strong outcomes) in upper-income rows demonstrates comprehensive, compounding disadvantage. Single-issue interventions that address employment OR education OR infrastructure in isolation will likely fail because these challenges reinforce each other.

Finding 4: Allegheny's Unique Challenges Are Broadband and Housing Vacancy¶

While most metrics track state and national averages, Allegheny County shows distinctive patterns in two areas (Visualization 6.3). Low-income tracts have 15.6% vacancy rates—higher than Pennsylvania (12.7%) or nationwide (11.9%)—a legacy of deindustrialization and population loss. Additionally, only 33.6% of low-income Allegheny residents have broadband access despite the region's tech hub branding. These are locally actionable problems where targeted investment could narrow gaps faster than addressing nationwide structural inequities like education or unemployment.

Finding 5: Disadvantage Clusters Geographically—It's Not Random¶

Maps of income classification (Visualization 6.7) and compound disadvantage (Visualization 6.8) reveal clear spatial segregation. Low-income tracts and high-disadvantage hot spots concentrate in connected urban core clusters, while upper-income/low-disadvantage areas dominate the northern suburbs. This proves that opportunity deficits aren't scattered randomly—they're geographically concentrated, meaning place-based interventions targeting specific neighborhoods could efficiently reach large populations facing similar barriers.

Finding 6: Transit Dependency Reveals Constrained Mobility, Not Preference¶

Low-income tracts show 7.6% transit commuting versus 2.6% in upper-income areas (Visualization 6.6). This inverse relationship indicates dependency rather than choice—residents lack vehicle access and are constrained to jobs along transit routes. Combined with poor broadband (limiting remote work options), this creates compounding mobility barriers that restrict employment geography and flexibility.

Finding 7: Success Stories Exist—Disadvantage Isn't Destiny¶

Twenty low/moderate income tracts in Allegheny County achieve employment rates matching or exceeding the middle/upper income average of 95.8% (Visualization 6.9). These "success story" tracts prove that protective factors—whether strong local employers, workforce programs, anchor institutions, or community organizing—can enable resilience despite structural disadvantage. Rather than only studying failure, examining what differentiates these outperforming neighborhoods could reveal replicable interventions.


8. Conclusions & Policy Recommendations¶

What Did This Analysis Prove?¶

This project set out to answer whether economic opportunity is systematically unequal across Allegheny County—and if so, where gaps concentrate and what drives them. The data provides unambiguous answers.

All five hypotheses were confirmed:

  • H1 (Employment): Low-income tracts show significantly lower employment rates and dramatically higher unemployment—labor market exclusion is real and severe.
  • H2 (Education): The education gap is extreme and concentrates specifically at Bachelor's degree attainment, not across all credentials.
  • H3 (Digital Divide): Broadband access lags substantially in low-income areas, creating barriers to remote work, telehealth, and digital services.
  • H4 (Housing Instability): Low-income tracts face higher vacancy rates and homeownership gaps, indicating both affordability barriers and neighborhood disinvestment.
  • H5 (Clustering): Disadvantage compounds geographically and dimensionally—the same neighborhoods struggle across multiple fronts simultaneously.

Beyond confirming disparities exist, the analysis reveals where they're most acute. Allegheny County largely mirrors national patterns, but shows distinctively higher vacancy rates and lower broadband penetration in disadvantaged areas—problems with clear local solutions. The compound disadvantage map identifies specific census tract hot spots where comprehensive intervention could address overlapping challenges efficiently.

Perhaps most importantly, the "success stories" visualization demonstrates that disadvantage isn't deterministic. Twenty low-income tracts achieve employment outcomes rivaling affluent neighborhoods, proving that local conditions, institutions, and policies can create protective factors even within broader structural constraints.

Policy Recommendations¶

Based on these findings, five evidence-based interventions could meaningfully narrow opportunity gaps in Allegheny County:

1. Prioritize Broadband Expansion in Low-Income Neighborhoods Only 33.6% of low-income Allegheny residents have broadband access—a fixable infrastructure gap that directly limits remote work, online education, telehealth, and digital services. Unlike education or unemployment, which require long-term systemic change, broadband is a capital investment with immediate returns. Target the dark red compound disadvantage hot spots (Visualization 6.8) first, bundling fiber deployment with digital literacy programs to ensure adoption follows access.

2. Focus Education Policy on Bachelor's Degree Completion, Not Just Access The data shows Associate's degrees and "some college" don't close income gaps—the divide is specifically at four-year degree attainment (Visualization 6.5). This means college access programs that increase enrollment without supporting completion may not improve economic mobility. Investments should target degree completion supports: advising, emergency financial aid, childcare, and flexible scheduling for working adults. Partner with local universities (Pitt, CMU, Duquesne, Point Park) to create pipeline programs from disadvantaged neighborhoods.

3. Implement Place-Based Comprehensive Interventions in Hot Spot Clusters The compound disadvantage map (Visualization 6.8) identifies specific census tract clusters facing overlapping employment, education, infrastructure, and housing challenges. Rather than scattering resources countywide, concentrate bundled interventions in these dark red zones: simultaneous workforce training, broadband deployment, housing stabilization, and transit improvements. Single-issue programs won't work where challenges compound—comprehensive strategies are required.

4. Study and Replicate Success Story Tract Conditions Twenty low-income tracts achieve employment rates matching affluent areas (Visualization 6.9). Commission case studies of these neighborhoods to identify common protective factors: Are certain employers hiring locally? Do community colleges or workforce programs operate there? Is transit access superior? Are civic organizations stronger? Understanding what enables resilience could produce replicable interventions more effective than generic anti-poverty programs designed without local knowledge.

5. Stabilize Housing Markets Through Anti-Blight and Vacancy Reduction Allegheny's 15.6% vacancy rate in low-income tracts exceeds state and national averages (Visualization 6.3), signaling neighborhood disinvestment and destabilization. Implement aggressive land banking, strategic demolition of unsalvageable structures, and side-lot programs that transfer vacant parcels to adjacent homeowners. Pair with targeted code enforcement and rehabilitation financing to prevent further decline. Housing stability creates conditions for other investments (broadband, workforce programs) to take root.

Limitations¶

Several constraints shape this analysis and its conclusions:

  • Causality vs. Correlation: This is a cross-sectional analysis—it identifies associations between income classification and outcomes but cannot prove causation. Low employment might cause low income, or low income might cause employment barriers, or both might result from third factors (historical redlining, school quality, employer location decisions). Longitudinal data tracking tracts over time would strengthen causal claims.

  • Data Granularity: Census tracts aggregate 1,200-8,000 people, masking within-tract variation. A tract classified as "low income" might contain pockets of stability, while a "middle income" tract might hide severe concentrated poverty. Block group or individual-level data would reveal finer patterns but weren't accessible for this scope.

  • Missing Metrics: Important dimensions of opportunity aren't captured here. Healthcare access, food security, environmental hazards, crime rates, school quality, and social capital all shape economic mobility but lack consistent tract-level data. The analysis focuses on what's measurable, not necessarily what matters most.

  • FFIEC Income Classifications Are Relative: The FFIEC categorizes tracts relative to metro area medians, not absolute standards. A "low income" tract in Allegheny has different economic conditions than a "low income" tract in San Francisco or rural Appalachia. Cross-metro comparisons require caution.

  • Static Snapshot: The 2022 ACS data represents conditions during post-pandemic recovery, which may differ from long-term trends. Comparing multiple ACS years would distinguish persistent patterns from temporary shocks.

  • Policy Implementation Uncertainty: Recommendations assume political will, funding availability, and community support—none guaranteed. Broadband expansion requires capital investment and regulatory cooperation. Education reforms face institutional resistance. Place-based interventions demand sustained multi-year commitment, which electoral cycles often disrupt.

Future Research¶

This analysis opens several promising avenues for deeper investigation:

1. Longitudinal Tract Analysis (2010-2022): Track how individual census tracts' income classifications and outcomes evolved over 12 years. Which low-income tracts improved? Which declined? What differentiates trajectories? This would reveal whether gaps are widening, stabilizing, or narrowing—and test whether specific investments (e.g., transit extensions, university expansions) correlate with neighborhood change.

2. Success Story Case Studies: Conduct qualitative research in the 20 high-performing low-income tracts identified in Visualization 6.9. Interview residents, employers, nonprofit leaders, and local officials to understand what protective factors exist. Are there shared characteristics (employer anchors, strong schools, civic organizations) that could be replicated in struggling areas?

3. Employer Location and Job Accessibility Analysis: Overlay tract-level outcomes with employer locations, industry clusters, and commute times. Do low-income tracts with poor employment outcomes have spatial mismatch problems—jobs exist but are geographically inaccessible? This could inform transit planning and workforce transportation subsidies.

4. School Quality and Educational Outcomes: Link census tract education data to specific school catchment areas and performance metrics. Does proximity to high-performing schools predict better educational attainment? This would test whether school quality drives neighborhood education gaps or whether selection (families moving to access good schools) explains patterns.

5. Broadband Speed and Adoption, Not Just Access: This analysis measured whether households have any broadband subscription, not speed or quality. Many low-income households rely on mobile-only internet or DSktops, which limits functionality. Granular data on speeds, reliability, and devices would better diagnose digital divide severity.

6. Comparative Metro Analysis: Replicate this framework for peer regions (Cleveland, Detroit, Buffalo—other post-industrial metros) to test whether Allegheny's patterns are unique or common. Do all legacy manufacturing cities show similar vacancy clustering? Do tech hubs with universities (Austin, Raleigh) show smaller education gaps? Cross-metro comparison would contextualize findings and reveal which problems are locally solvable versus nationally structural.

7. Policy Intervention Evaluation: If Allegheny County implements any recommended interventions (broadband expansion, place-based programs), design rigorous evaluation frameworks. Use difference-in-differences or matched-comparison designs to measure whether investments actually close gaps or whether improvements would have occurred anyway.


9. References¶

  1. U.S. Census Bureau. (2022). American Community Survey 5-Year Estimates. https://api.census.gov/data/2022/acs/acs5
  2. FFIEC. (2022). Census File for Community Reinvestment Act. https://www.ffiec.gov/censusapp.htm
  3. U.S. Census Bureau. (2022). TIGER/Line Shapefiles. https://www.census.gov/geographies/mapping-files.html
  4. ICIC, SRI International (2024). Connectivity: A New Place-Based Strategy for Distressed Communities and Economic Connectivity Dashboard. https://icic.shinyapps.io/economic_connectivity_dashboard/

Note: Figure layout and subplot structure adapted with assistance from ChatGPT (EDA/data visualization support). Note: Heatmap normalization and hovertext pattern developed with help from ChatGPT. Note: Mapbox choropleth configuration (geometry → geojson pattern) adapted with assistance from ChatGPT.